IXRag Package¶
RAG (Retrieval-Augmented Generation) with LightRAG integration, document limiting, and reranking.
Overview¶
The ixrag package provides the retrieval layer for the Rose platform. It integrates with LightRAG to provide hybrid retrieval combining:
- Text Chunks: Traditional vector similarity search
- Entities: Knowledge graph entities extracted from documents
- Relationships: Connections between entities in the knowledge graph
Key Components¶
- LangGraph Retriever: Main retrieval orchestration with reranking
- Document Limiter: Intelligent document allocation and limiting
- Document Processor: Converts LightRAG responses to LangChain Documents
- Multi-Tenant Support: Shared singleton with per-request tenant isolation via
contextvars
Reranking & Limiting Configuration¶
Configuration is defined in environment config files (e.g., development.toml, staging.toml, production.toml).
Config Options¶
| Option | Type | Default | Description |
|---|---|---|---|
mode |
string | "mix" |
RAG mode: global, local, hybrid, mix |
rerank_enabled |
bool | false |
Enable Cohere/Jina reranking |
limiter_enabled |
bool | true |
Enable type-based limiting (fallback when reranking disabled) |
rerank_top_k |
int | 20 |
Max documents returned (used by both reranking and limiting) |
rerank_provider |
string | "cohere" |
Reranking provider: cohere or jina |
rerank_model |
string | "rerank-v3.5" |
Model name for reranking |
relationships_enabled |
bool | true |
Include relationship documents in results |
graph_top_k |
int | 60 |
Initial retrieval limit for entities/relationships |
chunk_top_k |
int | 20 |
Initial retrieval limit for chunks |
Example Configuration¶
[lightrag]
mode = "mix"
rerank_enabled = true
limiter_enabled = true
rerank_top_k = 12
rerank_provider = "cohere"
rerank_model = "rerank-v3.5"
relationships_enabled = true
graph_top_k = 10
chunk_top_k = 15
Behavior Matrix¶
rerank_enabled |
limiter_enabled |
Behavior |
|---|---|---|
true |
- | Cohere/Jina reranks all documents together, returns top N by relevance |
false |
true |
Type-based allocation by RAG mode (see below) |
false |
false |
All documents returned without limiting |
Type-Based Allocation¶
When rerank_enabled=false and limiter_enabled=true, documents are allocated by type based on the RAG mode:
| Mode | Chunks | Entities | Relationships |
|---|---|---|---|
global |
30% | 35% | 35% |
local |
60% | 20% | 20% |
hybrid / mix |
30% | 30% | 40% |
The allocation percentages determine how the rerank_top_k budget is distributed across document types.
Technical Details¶
Why We Call Cohere Ourselves¶
LightRAG has built-in reranking support, but it's bypassed when using only_need_context=True (which we use to get raw context without LLM generation). The retrieval flow is:
- LightRAG Query: Call with
only_need_context=Trueto get raw documents - Document Processing: Convert LightRAG response to LangChain Documents
- Reranking (if enabled): Call Cohere/Jina to rerank all documents by query relevance
- Limiting (fallback): Apply type-based allocation if reranking unavailable
This approach gives us:
- Full control over the reranking process
- Unified reranking across all document types (chunks, entities, relationships)
- Ability to use the latest Cohere/Jina models
Document Types¶
Each document returned has a document_type in its metadata:
| Type | Source | Description |
|---|---|---|
chunk |
Vector search | Text chunks from indexed documents |
entity |
Knowledge graph | Extracted entities (people, companies, concepts) |
relationship |
Knowledge graph | Connections between entities |
Key Files¶
| File | Description |
|---|---|
ixrag/lightrag/langgraph_retriever.py |
Main retriever with reranking logic |
ixrag/lightrag/document_limiter.py |
Document limiting and allocation strategies |
ixrag/lightrag/document_processor.py |
Converts LightRAG responses to Documents |
ixrag/lightrag/lightrag_llm.py |
LLM and reranking function factories |
ixrag/lightrag/rag_instance_manager.py |
Shared singleton + per-tenant instance management |
Integration Points¶
| System | Purpose |
|---|---|
| LightRAG | Hybrid retrieval (vector + graph) |
| MongoDB | Vector storage backend |
| Neo4j | Graph storage backend |
| Cohere/Jina | Document reranking |
| LangFuse | Observability and tracing |
Usage¶
The retriever is typically accessed through ixchat, but can be used directly:
from ixrag.lightrag.langgraph_retriever import LangGraphRetriever
# Create retriever
retriever = LangGraphRetriever(
site_name="example-site",
rag_mode="mix",
)
# Retrieve documents
documents = await retriever.ainvoke("What are your pricing plans?")
# Each document has:
# - page_content: The text content
# - metadata: {document_type, source, rerank_score (if reranked)}
Shared Singleton Architecture (IX-1578)¶
Problem¶
Before IX-1578, every tenant got its own LightRAG instance. LightRAG initialization is expensive: it creates Neo4j drivers, MongoDB clients, loads embedding functions, and calls initialize_storages(). With more clients onboarding:
- Slow cold-start: Each new tenant required full LightRAG init, adding seconds to TTFT on the first request.
- Connection explosion: N tenants = N sets of connection pools.
- Memory pressure: N large LightRAG objects cached in memory.
Solution¶
One shared LightRAG instance, with tenant identity injected per-request via ContextVar.
The key insight: LightRAG itself is stateless with respect to tenant identity. All tenant filtering happens in the storage layer. So instead of one instance per tenant, there's a single shared instance, and storage classes resolve tenant_id dynamically from the current async task's context.
API request for "hexa.com"
|
+-- get_tenant_context("hexa.com") # pure string op, cached
| -> TenantContext(mongo="tenant_hexa_com", neo4j="hexa_com")
|
+-- set_tenant(ctx) # bind to current async task
|
+-- get_shared_rag_instance() # return the one singleton
|
+-- rag.aquery(...)
|
+-- Neo4jStorage.tenant_id -> get_tenant() -> "hexa_com"
| -> WHERE n.tenantId = "hexa_com"
|
+-- MongoStorage.tenant_id -> get_tenant() -> "tenant_hexa_com"
-> {"tenantId": "tenant_hexa_com"}
Components¶
TenantContext (ixinfra/tenant_context.py): A ContextVar providing task-local tenant identity in async code. Each asyncio.Task gets its own copy. Lives in ixinfra because it's a pure dataclass with zero storage imports.
@dataclass(frozen=True, slots=True)
class TenantContext:
company_name: str
mongo_tenant_id: str # e.g., "tenant_hexa_com"
neo4j_tenant_id: str # e.g., "hexa_com"
Singleton Factory (ixrag/lightrag/rag_instance_manager.py): Two code paths:
| Path | Function | Use Case |
|---|---|---|
| API server | get_shared_rag_instance() |
Returns the single shared instance. Double-checked locking. |
| CLI / doc loader | get_rag_instance(working_dir, key) |
Per-tenant instances for write operations (document indexing). |
get_tenant_context() replaces old DB round-trips: previously computing a tenant ID required opening a synchronous Neo4j connection just to sanitize a string. Now it's a pure in-process string operation, cached in _tenant_context_cache.
Dual-Mode tenant_id Property: Both TenantAwareNeo4JStorage and BaseTenantAwareStorage (MongoDB) resolve tenant identity dynamically:
@property
def tenant_id(self) -> str:
ctx = get_tenant() # check contextvar
if ctx is not None:
return ctx.neo4j_tenant_id # API server: contextvar wins
return self._tenant_id # CLI/test: instance attribute fallback
Write Guard: The shared singleton is read-only. If a write is attempted through it, upsert_node and upsert_edge in TenantAwareNeo4JStorage raise RuntimeError when the contextvar tenant mismatches the instance tenant ("__shared__").
Neo4j Driver Manager (ixneo4j/driver_manager.py): A ref-counted singleton AsyncDriver. Even with multiple LightRAG instances (CLI path), there's only one Neo4j connection pool per process.
Service Orchestration (ixchat/service.py): IXChatbotService ties everything together — calls get_tenant_context(), set_tenant(), then get_shared_rag_instance(). The eviction system only evicts lightweight per-tenant chatbot wrappers; the shared singleton is never evicted.
Tenant Isolation¶
Both Neo4j and MongoDB enforce isolation at every query:
- Neo4j: All
MATCHqueries injectWHERE n.tenantId = $_tenant_idvia_add_tenant_filter_to_query(). Write operations buffer nodes/edges then flush withUNWIND MERGE ... tenantId = $tenant_id. - MongoDB: All queries add
{"tenantId": self.tenant_id}filter. Document IDs are composite ("tenant_hexa_com:original-chunk-id") to prevent collisions across tenants.
Key Files¶
| File | Description |
|---|---|
ixinfra/tenant_context.py |
TenantContext dataclass + ContextVar |
ixrag/lightrag/rag_instance_manager.py |
Shared singleton + per-tenant factory |
ixneo4j/tenant_storage.py |
Dual-mode tenant_id, write guard |
ixmongo/tenant_storage.py |
Dual-mode tenant_id for MongoDB |
ixneo4j/driver_manager.py |
Ref-counted Neo4j AsyncDriver singleton |
ixchat/service.py |
IXChatbotService orchestration + eviction |
Data Consistency¶
The package includes tools for monitoring and maintaining consistency between MongoDB and Neo4j storage backends. See the ixrag/lightrag/ directory for:
cli_consistency_check.py- Check data consistencyreconcile_entities.py- Sync missing entitiesdiagnose_entity_mapping.py- Diagnose entity name issues