RAG Without Vectors: How PageIndex Retrieves by Reasoning

PageIndex is a retrieval-augmented generation system that eliminates vector embeddings in favor of a hierarchical table-of-contents tree index. This system enables LLMs to reason over document structures to pinpoint relevant sections with high precision before reading full text.

Why This Matters

Traditional vector-based RAG pipelines often fail on complex professional documents because semantic similarity is a weak proxy for relevance in domains like finance or law. In these contexts, relevant information is frequently spread across multiple sections, requiring structural navigation that chunk-based embeddings cannot provide. PageIndex addresses this by using LLMs to navigate a document’s hierarchy, ensuring that retrieval is grounded in structural reasoning rather than mere proximity in embedding space, which significantly reduces retrieval noise and improves accuracy on specialized benchmarks.

Key Insights

Hierarchical Tree Indexing: PageIndex constructs a tree of sections and subsections, storing titles and summaries in nodes to preserve the author’s original document structure (PageIndex, 2026).
Reasoning-Driven Retrieval: The system uses advanced LLMs like GPT-5.4 to scan node summaries and decide which sections to explore before loading full text, mimicking human expert behavior (PageIndex, 2026).
Benchmark Excellence: PageIndex demonstrated high retrieval accuracy on FinanceBench, outperforming traditional vector similarity models in domains requiring deep understanding (PageIndex, 2026).
Cross-Cutting Query Resolution: By reasoning over the document tree, PageIndex can identify and stitch together multiple relevant nodes that are geographically separated in the text (PageIndex, 2026).
Vectorless Infrastructure: The system eliminates the need for embedding generation and vector database management, relying instead on structured exploration and LLM reasoning (PageIndex, 2026).

Working Examples

Initialization of the PageIndex client and the asynchronous LLM helper function.

from pageindex import PageIndexClient
import pageindex.utils as utils
import os
from getpass import getpass

PAGEINDEX_API_KEY = getpass('Enter PageIndex API Key: ')
pi_client = PageIndexClient(api_key=PAGEINDEX_API_KEY)

import openai
OPENAI_API_KEY = getpass('Enter OpenAI API Key: ')
async def call_llm(prompt, model="gpt-5.4", temperature=0):
    client = openai.AsyncOpenAI(api_key=OPENAI_API_KEY)
    response = await client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        temperature=temperature
    )
    return response.choices[0].message.content.strip()

Building the hierarchical document tree from a PDF source.

pdf_url = "https://arxiv.org/pdf/1706.03762.pdf"
doc_id = pi_client.submit_document(pdf_url)["doc_id"]

import time
while not pi_client.is_retrieval_ready(doc_id):
    time.sleep(5)
tree = pi_client.get_tree(doc_id, node_summary=True)["result"]
utils.print_tree(tree)

Practical Applications

Financial Analysis: Extracting specific complexity metrics and trade-offs from SEC filings where data is spread across various disclosures. Pitfall: Vector-only search might miss quantitative tables if the query is purely conceptual and lacks direct semantic overlap.
Legal Research: Navigating structured contracts or research papers to synthesize mechanisms across sections, such as identifying multi-head attention logic in the Transformer paper. Pitfall: Standard chunking can break the contextual link between a mechanism’s definition and its evaluation sections.

References:

On This Page

RAG Without Vectors: How PageIndex Retrieves by Reasoning