Skip to main content

On This Page

Building Heritage Keeper: A Gemini Live Agent for Family Story Preservation

2 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

Building Heritage Keeper: A Gemini Live Agent for Family Story Preservation

Heritage Keeper is a voice-first AI agent built on the Gemini Live API that processes PCM 16-bit audio at 16kHz to preserve oral histories. The system autonomously coordinates five function-calling tools and Google Search grounding to verify historical facts in real-time.

Why This Matters

Technical reality dictates that preserving family history often fails due to the high friction of manual data entry in traditional genealogy software. Heritage Keeper addresses this by moving from rigid forms to a bidirectional audio session where the AI manages state and context extraction, though developers must implement custom filtering for the model’s internal reasoning parts to maintain a clean user experience.

Key Insights

  • The gemini-2.5-flash-native-audio model includes internal reasoning ‘thought’ parts in its responses that must be filtered before forwarding to the user interface.
  • Grounding AI responses with Google Search transforms historical context from speculative trivia into verifiable data such as cost of living and historical wage comparisons.
  • WebSocket stability in Cloud Run environments requires exponential backoff reconnection strategies (1s, 2s, 4s) to handle network blips and timeouts.
  • The agent utilizes five specific tools including save_story and search_photos to autonomously extract names, dates, and relationships from streaming audio.
  • Browser-side audio capture at 16kHz PCM 16-bit is required for the bidirectional session, while Gemini responds with native 24kHz audio.

Practical Applications

  • Use Case: Building complex family trees via natural voice commands like ‘Bob is my father’ to trigger the add_family_member function tool. Pitfall: Failing to provide specific instructions for short commands may cause the agent to incorrectly attempt full story extraction.
  • Use Case: Automated historical photo retrieval using the Wikimedia Commons API with bitmap-only filtering for timeline entries. Pitfall: Neglecting to handle varied SDK message formats (LiveServerMessage vs JSON) can cause parser crashes during audio streaming.

References:

Continue reading

Next article

AI-Assisted Learning Trends: Developers Prioritize Efficiency but Maintain Human Validation

Related Content