Skip to main content

On This Page

Building Hybrid-Memory Autonomous Agents with Modular Tool Dispatch and OpenAI

3 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

Build a Hybrid-Memory Autonomous Agent with Modular Architecture and Tool Dispatch Using OpenAI

Sana Hassan introduces a modular framework for autonomous agents using OpenAI’s gpt-4o-mini and a custom hybrid memory backend. The system utilizes Reciprocal Rank Fusion (RRF) with a constant of 60 to combine semantic vector search with keyword-based BM25 retrieval.

Why This Matters

Standard RAG implementations often struggle with long-term consistency and exact keyword retrieval in complex production environments. By implementing a modular architecture with abstract base classes for memory, LLM providers, and tools, developers can mitigate retrieval failure and ensure the agent maintains a deterministic persona across multi-turn tool-dispatching loops, essential for high-reliability software engineering tasks.

Key Insights

  • Hybrid Memory Retrieval: Merges semantic vectors from text-embedding-3-small with BM25 keyword rankings using a Reciprocal Rank Fusion (RRF) constant of 60.
  • Modular Interface Design: Employs abstract base classes (MemoryBackend, LLMProvider, Tool) to enable runtime hot-swapping of components like the UpgradedWebSnippetTool.
  • Deterministic Persona Management: The AgentPersona class dynamically compiles system prompts to enforce core traits while explicitly banning phrases like ‘As an AI language model’.
  • Recursive Tool Dispatch: The agent loop supports up to 8 recursive tool rounds, enabling multi-step reasoning such as calculating project timelines based on retrieved facts.
  • Precision Recall: Hybrid search ensures that specific alphanumeric identifiers, such as ‘Order #4821’, are accurately retrieved when semantic scores alone are insufficient.

Working Examples

Implementation of hybrid search merging vector and keyword scores via Reciprocal Rank Fusion.

class HybridMemory(MemoryBackend):
    RRF_K = 60
    def __init__(self):
        self._chunks: List[MemoryChunk] = []
        self._bm25: Optional[BM25Okapi] = None
    def search(self, query: str, top_k: int = 5) -> List[Dict[str, Any]]:
        [q_vec] = _embed([query])
        cos_scores = np.array([np.dot(q_vec, c.embedding) for c in self._chunks])
        vec_ranks = {self._chunks[i].id: rank + 1 for rank, i in enumerate(np.argsort(-cos_scores))}
        bm25_scores = self._bm25.get_scores(_tokenise(query))
        kw_ranks = {self._chunks[i].id: rank + 1 for rank, i in enumerate(np.argsort(-bm25_scores))}
        rrf: Dict[str, float] = {}
        for chunk in self._chunks:
            cid = chunk.id
            rrf[cid] = (1.0 / (self.RRF_K + vec_ranks.get(cid, len(self._chunks) + 1)) + 
                        1.0 / (self.RRF_K + kw_ranks.get(cid, len(self._chunks) + 1)))
        ranked_ids = sorted(rrf, key=lambda x: rrf[x], reverse=True)[:top_k]
        return [next(c for c in self._chunks if c.id == cid) for cid in ranked_ids]

Practical Applications

  • Knowledge-Intensive Research: A research assistant recalling Raft consensus algorithm details for the ‘VelocityDB’ project to provide precise technical answers. Pitfall: Using monolithic agent loops that cannot hot-swap tools at runtime, leading to brittle system integration.
  • Stateful Inventory Management: Tracking specific order IDs like #4821 through hybrid memory to ensure exact matches across multi-turn sessions. Pitfall: Relying purely on vector embeddings which often struggle with exact alphanumeric string matching in dense corpora.

References:

Continue reading

Next article

Building Robust Google Drive Sync Engines for Chrome Manifest V3

Related Content