Skip to content

IXRag Package

RAG (Retrieval-Augmented Generation) with LightRAG integration, document limiting, and reranking.

Overview

The ixrag package provides the retrieval layer for the Rose platform. It integrates with LightRAG to provide hybrid retrieval combining:

  • Text Chunks: Traditional vector similarity search
  • Entities: Knowledge graph entities extracted from documents
  • Relationships: Connections between entities in the knowledge graph

Key Components

  • LangGraph Retriever: Main retrieval orchestration with reranking
  • Document Limiter: Intelligent document allocation and limiting
  • Document Processor: Converts LightRAG responses to LangChain Documents
  • Multi-Tenant Support: Shared singleton with per-request tenant isolation via contextvars

Reranking & Limiting Configuration

Configuration is defined in environment config files (e.g., development.toml, staging.toml, production.toml).

Config Options

Option Type Default Description
mode string "mix" RAG mode: global, local, hybrid, mix
rerank_enabled bool false Enable Cohere/Jina reranking
limiter_enabled bool true Enable type-based limiting (fallback when reranking disabled)
rerank_top_k int 20 Max documents returned (used by both reranking and limiting)
rerank_provider string "cohere" Reranking provider: cohere or jina
rerank_model string "rerank-v3.5" Model name for reranking
relationships_enabled bool true Include relationship documents in results
graph_top_k int 60 Initial retrieval limit for entities/relationships
chunk_top_k int 20 Initial retrieval limit for chunks

Example Configuration

[lightrag]
mode = "mix"
rerank_enabled = true
limiter_enabled = true
rerank_top_k = 12
rerank_provider = "cohere"
rerank_model = "rerank-v3.5"
relationships_enabled = true
graph_top_k = 10
chunk_top_k = 15

Behavior Matrix

rerank_enabled limiter_enabled Behavior
true - Cohere/Jina reranks all documents together, returns top N by relevance
false true Type-based allocation by RAG mode (see below)
false false All documents returned without limiting

Type-Based Allocation

When rerank_enabled=false and limiter_enabled=true, documents are allocated by type based on the RAG mode:

Mode Chunks Entities Relationships
global 30% 35% 35%
local 60% 20% 20%
hybrid / mix 30% 30% 40%

The allocation percentages determine how the rerank_top_k budget is distributed across document types.

Technical Details

Why We Call Cohere Ourselves

LightRAG has built-in reranking support, but it's bypassed when using only_need_context=True (which we use to get raw context without LLM generation). The retrieval flow is:

  1. LightRAG Query: Call with only_need_context=True to get raw documents
  2. Document Processing: Convert LightRAG response to LangChain Documents
  3. Reranking (if enabled): Call Cohere/Jina to rerank all documents by query relevance
  4. Limiting (fallback): Apply type-based allocation if reranking unavailable

This approach gives us:

  • Full control over the reranking process
  • Unified reranking across all document types (chunks, entities, relationships)
  • Ability to use the latest Cohere/Jina models

Document Types

Each document returned has a document_type in its metadata:

Type Source Description
chunk Vector search Text chunks from indexed documents
entity Knowledge graph Extracted entities (people, companies, concepts)
relationship Knowledge graph Connections between entities

Key Files

File Description
ixrag/lightrag/langgraph_retriever.py Main retriever with reranking logic
ixrag/lightrag/document_limiter.py Document limiting and allocation strategies
ixrag/lightrag/document_processor.py Converts LightRAG responses to Documents
ixrag/lightrag/lightrag_llm.py LLM and reranking function factories
ixrag/lightrag/rag_instance_manager.py Shared singleton + per-tenant instance management

Integration Points

System Purpose
LightRAG Hybrid retrieval (vector + graph)
MongoDB Vector storage backend
Neo4j Graph storage backend
Cohere/Jina Document reranking
LangFuse Observability and tracing

Usage

The retriever is typically accessed through ixchat, but can be used directly:

from ixrag.lightrag.langgraph_retriever import LangGraphRetriever

# Create retriever
retriever = LangGraphRetriever(
    site_name="example-site",
    rag_mode="mix",
)

# Retrieve documents
documents = await retriever.ainvoke("What are your pricing plans?")

# Each document has:
# - page_content: The text content
# - metadata: {document_type, source, rerank_score (if reranked)}

Shared Singleton Architecture (IX-1578)

Problem

Before IX-1578, every tenant got its own LightRAG instance. LightRAG initialization is expensive: it creates Neo4j drivers, MongoDB clients, loads embedding functions, and calls initialize_storages(). With more clients onboarding:

  • Slow cold-start: Each new tenant required full LightRAG init, adding seconds to TTFT on the first request.
  • Connection explosion: N tenants = N sets of connection pools.
  • Memory pressure: N large LightRAG objects cached in memory.

Solution

One shared LightRAG instance, with tenant identity injected per-request via ContextVar.

The key insight: LightRAG itself is stateless with respect to tenant identity. All tenant filtering happens in the storage layer. So instead of one instance per tenant, there's a single shared instance, and storage classes resolve tenant_id dynamically from the current async task's context.

API request for "hexa.com"
    |
    +-- get_tenant_context("hexa.com")      # pure string op, cached
    |       -> TenantContext(mongo="tenant_hexa_com", neo4j="hexa_com")
    |
    +-- set_tenant(ctx)                      # bind to current async task
    |
    +-- get_shared_rag_instance()            # return the one singleton
    |
    +-- rag.aquery(...)
            |
            +-- Neo4jStorage.tenant_id -> get_tenant() -> "hexa_com"
            |       -> WHERE n.tenantId = "hexa_com"
            |
            +-- MongoStorage.tenant_id -> get_tenant() -> "tenant_hexa_com"
                    -> {"tenantId": "tenant_hexa_com"}

Components

TenantContext (ixinfra/tenant_context.py): A ContextVar providing task-local tenant identity in async code. Each asyncio.Task gets its own copy. Lives in ixinfra because it's a pure dataclass with zero storage imports.

@dataclass(frozen=True, slots=True)
class TenantContext:
    company_name: str
    mongo_tenant_id: str   # e.g., "tenant_hexa_com"
    neo4j_tenant_id: str   # e.g., "hexa_com"

Singleton Factory (ixrag/lightrag/rag_instance_manager.py): Two code paths:

Path Function Use Case
API server get_shared_rag_instance() Returns the single shared instance. Double-checked locking.
CLI / doc loader get_rag_instance(working_dir, key) Per-tenant instances for write operations (document indexing).

get_tenant_context() replaces old DB round-trips: previously computing a tenant ID required opening a synchronous Neo4j connection just to sanitize a string. Now it's a pure in-process string operation, cached in _tenant_context_cache.

Dual-Mode tenant_id Property: Both TenantAwareNeo4JStorage and BaseTenantAwareStorage (MongoDB) resolve tenant identity dynamically:

@property
def tenant_id(self) -> str:
    ctx = get_tenant()           # check contextvar
    if ctx is not None:
        return ctx.neo4j_tenant_id  # API server: contextvar wins
    return self._tenant_id          # CLI/test: instance attribute fallback

Write Guard: The shared singleton is read-only. If a write is attempted through it, upsert_node and upsert_edge in TenantAwareNeo4JStorage raise RuntimeError when the contextvar tenant mismatches the instance tenant ("__shared__").

Neo4j Driver Manager (ixneo4j/driver_manager.py): A ref-counted singleton AsyncDriver. Even with multiple LightRAG instances (CLI path), there's only one Neo4j connection pool per process.

Service Orchestration (ixchat/service.py): IXChatbotService ties everything together — calls get_tenant_context(), set_tenant(), then get_shared_rag_instance(). The eviction system only evicts lightweight per-tenant chatbot wrappers; the shared singleton is never evicted.

Tenant Isolation

Both Neo4j and MongoDB enforce isolation at every query:

  • Neo4j: All MATCH queries inject WHERE n.tenantId = $_tenant_id via _add_tenant_filter_to_query(). Write operations buffer nodes/edges then flush with UNWIND MERGE ... tenantId = $tenant_id.
  • MongoDB: All queries add {"tenantId": self.tenant_id} filter. Document IDs are composite ("tenant_hexa_com:original-chunk-id") to prevent collisions across tenants.

Key Files

File Description
ixinfra/tenant_context.py TenantContext dataclass + ContextVar
ixrag/lightrag/rag_instance_manager.py Shared singleton + per-tenant factory
ixneo4j/tenant_storage.py Dual-mode tenant_id, write guard
ixmongo/tenant_storage.py Dual-mode tenant_id for MongoDB
ixneo4j/driver_manager.py Ref-counted Neo4j AsyncDriver singleton
ixchat/service.py IXChatbotService orchestration + eviction

Data Consistency

The package includes tools for monitoring and maintaining consistency between MongoDB and Neo4j storage backends. See the ixrag/lightrag/ directory for:

  • cli_consistency_check.py - Check data consistency
  • reconcile_entities.py - Sync missing entities
  • diagnose_entity_mapping.py - Diagnose entity name issues