Cognee GraphRAG Integration (M72)
1. Executive Summary
| Field | Description |
|---|---|
| Feature Name | Cognee GraphRAG Integration (Complete) |
| Milestone | M72 |
| Current Status | 50% → Target: 100% |
| Architectural Goal | Transform ChainAlign's knowledge retrieval from simple vector search to intelligent Graph-based Retrieval-Augmented Generation (GraphRAG) with entity relationships, multi-hop reasoning, and structured context retrieval. |
| System Impact | Powers the Socratic Inquiry Engine (M70) with rich organizational context (Strategic North, Historical Failures, Constraints). Enables proactive decision support by surfacing relationships between decisions, constraints, and outcomes. |
| Target Users | All decision-makers (indirect), AI systems (direct consumer) |
| MVP Success Metric | Socratic questions generated with context retrieved from GraphRAG (3 context types). Knowledge graph successfully visualized with entities and relationships. |
2. Strategic Context
Why GraphRAG vs. Traditional Vector Search?
| Feature | Traditional Vector Search (Current 50%) | GraphRAG (Target 100%) |
|---|---|---|
| Context Retrieval | Semantic similarity only (embeddings) | Semantic + Structural (relationships) |
| Multi-Hop Reasoning | ❌ Cannot traverse relationships | ✅ "Who does this supplier depend on?" |
| Entity Recognition | ❌ No entity extraction | ✅ Extracts companies, people, products, constraints |
| Temporal Reasoning | ❌ No time-aware queries | ✅ "What decisions did we make in Q3 2024 about PFAS?" |
| Relationship Mapping | ❌ No explicit links | ✅ "How is Constraint A related to Decision B?" |
| Proactive Insights | ❌ Reactive only | ✅ "Similar decisions made in 2022 led to X outcome" |
GraphRAG Use Cases in ChainAlign:
- Socratic Inquiry Engine: Retrieve board memos, incident reports, capacity constraints with metadata
- Decision History Feed: "Show me decisions that referenced Supplier X"
- Constraint Intelligence: "What products are affected by this new regulation?"
- Proactive Alerting: "Your current decision context matches a past failure scenario"
3. Current State (50% Complete)
What's Already Built:
✅ Cognee service running at http://localhost:5003 (port 8004 in Docker)
✅ Basic endpoints: /cognee/add, /cognee/cognify, /cognee/search
✅ RAGService integration (retrieveRelevantChunks() calls Cognee)
✅ PostgreSQL backend configured for graph and vector storage
What's Missing (50% Gap):
❌ Structured Context Retrieval: No support for filtering by document type, recency, metadata ❌ Entity Extraction: Not extracting companies, products, constraints as nodes ❌ Relationship Mapping: No explicit edges between entities (e.g., "Product A → Uses → Supplier B") ❌ Knowledge Graph Visualization: No endpoint to export graph structure ❌ Multi-Hop Queries: Cannot traverse relationships ("Find all decisions that depend on Supplier X") ❌ Metadata-Driven Search: No way to query "board memos from last 6 months" ❌ Integration with SIE: No specialized endpoints for Strategic North, Historical Failures, Constraints
4. Functional Requirements
FR-1: Enhanced Document Ingestion with Metadata
| ID | Requirement | Details |
|---|---|---|
| FR-1.1 | Document Type Tagging | All documents must be tagged with type: board_memo, incident_report, capacity_report, budget_snapshot, supplier_contract, regulatory_filing, decision_record |
| FR-1.2 | Temporal Metadata | Track document created_date, effective_date, expiry_date |
| FR-1.3 | Entity Extraction | Automatically extract entities: company, product, constraint, supplier, customer, regulatory_body |
| FR-1.4 | Relationship Inference | Infer relationships: affects, depends_on, mitigates, violates, supersedes |
| FR-1.5 | Batch Ingestion API | Support bulk upload of documents with metadata (e.g., import 100 board memos with dates) |
Example Ingestion Payload:
{
"documents": [
{
"content": "Board Memo Q3 2024: Priority #1 is 95% on-time delivery...",
"metadata": {
"document_type": "board_memo",
"created_date": "2024-10-01",
"effective_date": "2024-10-01",
"source": "Board Meeting Minutes",
"entities": ["95% on-time delivery", "40% gross margin floor"],
"tags": ["strategic_priority", "Q3_2024"]
}
}
]
}
FR-2: Structured Context Retrieval
| ID | Requirement | Details |
|---|---|---|
| FR-2.1 | Filter by Document Type | Endpoint: POST /cognee/search-structuredFilters: document_type (array), recency (e.g., "last_6_months"), tags (array) |
| FR-2.2 | Strategic North Retrieval | Specialized query: "Get board memos and strategic priorities from last 6 months" Returns: Top 3 most relevant strategic documents with snippets |
| FR-2.3 | Historical Failures Retrieval | Specialized query: "Get incident reports related to {decision_type}" Returns: Past incidents with severity, lessons learned |
| FR-2.4 | Constraints Retrieval | Specialized query: "Get current capacity and budget constraints for {decision_scope}" Returns: Real-time constraint data with utilization percentages |
| FR-2.5 | Temporal Filtering | Support time ranges: "last_30_days", "last_6_months", "Q3_2024", "2024" |
Example Structured Search Request:
{
"query": "PFAS regulation compliance",
"filters": {
"document_type": ["incident_report", "regulatory_filing"],
"recency": "last_6_months",
"severity": "high"
},
"limit": 3
}
Example Response:
{
"status": "success",
"results": [
{
"content": "Incident Report 2024-IR-045: PFAS supplier qualification delay...",
"source": "Incident Report 2024-IR-045",
"document_type": "incident_report",
"created_date": "2024-08-15",
"severity": "high",
"relevance_score": 0.92,
"entities": ["PFAS", "Metco", "supplier qualification"],
"relationships": [
{"type": "affects", "target": "Product Line A"},
{"type": "related_to", "target": "Decision CHA-123"}
]
}
]
}
FR-3: Knowledge Graph Visualization
| ID | Requirement | Details |
|---|---|---|
| FR-3.1 | Export Graph Structure | Endpoint: GET /cognee/graph/exportReturns: Nodes and edges in JSON format |
| FR-3.2 | Node Types | Nodes represent: Document, Entity, Constraint, Decision, Supplier, Product |
| FR-3.3 | Edge Types | Edges represent: affects, depends_on, mitigates, violates, references |
| FR-3.4 | Subgraph Extraction | Endpoint: POST /cognee/graph/subgraphInput: Entity ID or Decision ID Returns: All nodes and edges within N hops |
| FR-3.5 | Graph Metrics | Return: Node count, edge count, average degree, clustering coefficient |
Example Graph Export:
{
"nodes": [
{"id": "doc-board-memo-q3-2024", "type": "Document", "label": "Board Memo Q3 2024", "metadata": {"document_type": "board_memo"}},
{"id": "entity-95-otd", "type": "Strategic_Priority", "label": "95% On-Time Delivery", "metadata": {}},
{"id": "constraint-pfas-ban", "type": "Regulatory_Constraint", "label": "PFAS Ban 2025", "metadata": {"severity": "high"}}
],
"edges": [
{"source": "doc-board-memo-q3-2024", "target": "entity-95-otd", "type": "defines"},
{"source": "constraint-pfas-ban", "target": "entity-95-otd", "type": "threatens"}
]
}
FR-4: Multi-Hop Reasoning
| ID | Requirement | Details |
|---|---|---|
| FR-4.1 | Traversal Queries | Endpoint: POST /cognee/graph/traverseInput: Start node, relationship types, max hops Returns: All reachable nodes and paths |
| FR-4.2 | Dependency Analysis | Query: "Find all products that depend on Supplier X" Traverse: Product → depends_on → Supplier |
| FR-4.3 | Impact Analysis | Query: "What decisions are affected by Constraint Y?" Traverse: Constraint → affects → Decision |
| FR-4.4 | Historical Pattern Matching | Query: "Find past decisions with similar context to current decision" Uses: Graph similarity + semantic similarity |
Example Traversal Request:
{
"start_node": "supplier-metco",
"relationship_types": ["supplies_to", "affects"],
"max_hops": 2,
"filter": {"node_type": "Product"}
}
Example Response:
{
"paths": [
{
"path": ["supplier-metco", "supplies_to", "product-line-a", "affects", "decision-cha-123"],
"length": 2,
"entities": ["Metco", "Product Line A", "Decision CHA-123"]
}
]
}
FR-5: Integration with Socratic Inquiry Engine
| ID | Requirement | Details |
|---|---|---|
| FR-5.1 | Context Retrieval for QGM | SocraticInquiryService.generateQuestions() must call Cognee structured search |
| FR-5.2 | Strategic North Query | Call: POST /cognee/search-structured with filters: document_type=["board_memo"], recency="last_6_months" |
| FR-5.3 | Historical Failures Query | Call: POST /cognee/search-structured with filters: document_type=["incident_report"], decision_type={current_decision_type} |
| FR-5.4 | Constraints Query | Call: POST /cognee/search-structured with filters: document_type=["capacity_report", "budget_snapshot"], recency="last_30_days" |
| FR-5.5 | Context Snippet Formatting | Cognee must return snippets in format: {source, text, metadata, relevance_score} |
5. Technical Architecture
5.1. Cognee Service Enhancements
New Endpoints
1. Structured Search (Enhanced)
@app.post("/cognee/search-structured")
async def cognee_search_structured(request: StructuredSearchRequest):
"""
Performs structured search with filters for document type, recency, tags.
"""
# Apply filters to GraphRAG query
# Return results with metadata
2. Graph Export
@app.get("/cognee/graph/export")
async def export_graph():
"""
Exports the full knowledge graph structure (nodes + edges).
"""
# Query Cognee's internal graph DB
# Return JSON representation
3. Subgraph Extraction
@app.post("/cognee/graph/subgraph")
async def get_subgraph(request: SubgraphRequest):
"""
Extracts a subgraph around a specific entity or decision.
"""
# BFS/DFS traversal from start_node
# Return nodes and edges within max_hops
4. Graph Traversal
@app.post("/cognee/graph/traverse")
async def traverse_graph(request: TraversalRequest):
"""
Performs multi-hop traversal following specified relationship types.
"""
# Cypher-style query execution
# Return paths and reachable nodes
5. Batch Metadata Ingestion
@app.post("/cognee/ingest-batch")
async def ingest_batch(request: BatchIngestionRequest):
"""
Bulk upload of documents with rich metadata (type, dates, entities, tags).
"""
# Validate metadata schema
# Extract entities and relationships
# Add to knowledge graph
5.2. Backend Service Integration
Enhance CogneeService.js (New Service or Extend RAGService)
// backend/src/services/CogneeService.js
class CogneeService {
constructor() {
this.cogneeUrl = process.env.COGNEE_SERVICE_URL || 'http://localhost:8004';
}
/**
* Structured search with filters for document type, recency, tags.
*/
async searchStructured({ query, filters, limit = 5 }) {
const response = await fetch(`${this.cogneeUrl}/cognee/search-structured`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ query, filters, limit })
});
if (!response.ok) throw new Error(`Cognee structured search failed: ${response.statusText}`);
return await response.json();
}
/**
* Retrieve Strategic North context (board memos, strategic priorities).
*/
async retrieveStrategicContext() {
return this.searchStructured({
query: "What are the current strategic priorities and board mandates?",
filters: {
document_type: ["board_memo"],
recency: "last_6_months"
},
limit: 3
});
}
/**
* Retrieve Historical Failures context (incident reports).
*/
async retrieveHistoricalFailures(decisionType) {
return this.searchStructured({
query: `Past incidents related to ${decisionType}`,
filters: {
document_type: ["incident_report"],
severity: "high"
},
limit: 2
});
}
/**
* Retrieve Constraints context (capacity, budget).
*/
async retrieveConstraints(decisionScope) {
return this.searchStructured({
query: `Current capacity utilization and budget constraints for ${decisionScope}`,
filters: {
document_type: ["capacity_report", "budget_snapshot"],
recency: "last_30_days"
},
limit: 2
});
}
/**
* Export full knowledge graph.
*/
async exportGraph() {
const response = await fetch(`${this.cogneeUrl}/cognee/graph/export`);
if (!response.ok) throw new Error(`Cognee graph export failed: ${response.statusText}`);
return await response.json();
}
/**
* Get subgraph around an entity or decision.
*/
async getSubgraph(startNode, maxHops = 2) {
const response = await fetch(`${this.cogneeUrl}/cognee/graph/subgraph`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ start_node: startNode, max_hops: maxHops })
});
if (!response.ok) throw new Error(`Cognee subgraph extraction failed: ${response.statusText}`);
return await response.json();
}
/**
* Multi-hop graph traversal.
*/
async traverseGraph({ startNode, relationshipTypes, maxHops, filter }) {
const response = await fetch(`${this.cogneeUrl}/cognee/graph/traverse`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ start_node: startNode, relationship_types: relationshipTypes, max_hops: maxHops, filter })
});
if (!response.ok) throw new Error(`Cognee graph traversal failed: ${response.statusText}`);
return await response.json();
}
/**
* Batch ingest documents with metadata.
*/
async ingestBatch(documents) {
const response = await fetch(`${this.cogneeUrl}/cognee/ingest-batch`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ documents })
});
if (!response.ok) throw new Error(`Cognee batch ingestion failed: ${response.statusText}`);
return await response.json();
}
}
export default new CogneeService();
5.3. Database Schema (PostgreSQL via Cognee)
Cognee uses PostgreSQL for both graph and vector storage. The schema is managed by Cognee internally, but we need to ensure:
Graph Tables:
nodes- Stores entities (documents, constraints, products, suppliers)edges- Stores relationships (affects, depends_on, mitigates, etc.)
Vector Tables:
embeddings- Stores document embeddings for semantic search
Metadata Indexes:
- Index on
document_type,created_date,tagsfor fast filtering
6. Performance Requirements
| Metric | Target | Rationale |
|---|---|---|
| Structured Search | < 2 seconds | Context retrieval for Socratic questions must be fast |
| Graph Export | < 5 seconds | Full graph export for visualization |
| Subgraph Extraction | < 1 second | Fast entity-centric exploration |
| Multi-Hop Traversal | < 3 seconds | Complex queries with 2-3 hops |
| Batch Ingestion | 100 documents in < 10 seconds | Bulk import of historical data |
7. Security & Compliance
7.1. Tenant Isolation
Problem: Cognee service is shared across tenants. Must prevent data leakage.
Solution:
- Add
tenant_idto all document metadata during ingestion - Filter all searches by
tenant_idat the service layer (backend) - Cognee service itself is tenant-agnostic; isolation enforced by backend
Example:
// In CogneeService.js
async searchStructured({ query, filters, limit, tenantId }) {
// Always add tenant_id filter
filters.tenant_id = tenantId;
// Send to Cognee...
}
7.2. Data Retention
- Knowledge graph data retained for 7 years (regulatory compliance)
- Entities and relationships never deleted, only soft-deleted (add
deleted_atflag)
8. Testing Requirements
8.1. Unit Tests (80% coverage)
CogneeService.searchStructured()with various filtersCogneeService.retrieveStrategicContext()returns correct document typesCogneeService.traverseGraph()returns valid paths
8.2. Integration Tests
- E2E: Ingest document → Cognify → Structured search → Verify results
- E2E: Create decision → SIE generates questions → Cognee retrieves context → Questions contain context snippets
- Graph traversal: "Find all products depending on Supplier X"
8.3. Performance Tests
- Load test: 1,000 documents ingested, structured search returns in < 2s
- Stress test: 50 concurrent structured searches
9. Migration Strategy
9.1. Backfill Historical Data
Phase 1: Metadata Enrichment (Week 1)
- Export existing
documentsanddata_elementstables - Manually tag with
document_type,created_date,tags - Batch ingest into Cognee via
/cognee/ingest-batch
Phase 2: Entity Extraction (Week 2)
- Run Cognee's entity extraction on backfilled documents
- Verify entities extracted correctly (companies, products, constraints)
Phase 3: Relationship Inference (Week 3)
- Use LLM to infer relationships between entities
- Example: "Board Memo mentions 95% OTD →
defines→ Strategic Priority"
9.2. Dual-Mode Operation
- Legacy Mode: RAGService continues using basic vector search (
retrieveRelevantChunks()) - GraphRAG Mode: New code uses
CogneeService.searchStructured() - Feature flag:
ENABLE_GRAPHRAG(default: false)
9.3. Rollout Plan
Week 1: Deploy enhanced Cognee service with new endpoints Week 2: Backfill historical data (100 documents) Week 3: Enable GraphRAG for SIE (Socratic questions only) Week 4: Enable GraphRAG for all RAG use cases (replace legacy mode)
10. Success Criteria
M72 is complete when: ✅ Structured search endpoint functional with filters (document_type, recency, tags) ✅ Strategic North, Historical Failures, Constraints retrieval methods working ✅ Knowledge graph export returns nodes and edges in JSON format ✅ Multi-hop traversal (2-3 hops) functional ✅ Socratic Inquiry Engine (M70) successfully retrieves context from Cognee ✅ At least 100 historical documents ingested and cognified ✅ Unit test coverage ≥ 80% ✅ Integration tests passing ✅ Performance targets met (< 2s structured search)
11. Dependencies
| Dependency | Status | Required For |
|---|---|---|
| Cognee Service (Basic) | ✅ 50% Complete | Foundation for enhancements |
| PostgreSQL with pgvector | ✅ Operational | Graph and vector storage |
| M70 (Socratic Inquiry Engine) | 🔄 In Progress | Context consumer |
| Document Ingestion Pipeline | ✅ Operational | Source of documents to cognify |
12. Future Enhancements (Post-M72)
Phase 2 Features:
- Graph Visualization UI: Interactive graph explorer in frontend
- Automated Relationship Inference: LLM-powered edge generation
- Temporal Graph Queries: "How did our strategic priorities change over time?"
- Probabilistic Reasoning: "What's the likelihood this decision leads to outcome X based on historical patterns?"
- Cross-Tenant Graph Insights: Anonymized pattern sharing across tenants (with consent)
Document Status: Complete FSD Last Updated: 2025-11-14 Milestone: M72 - Cognee GraphRAG Integration (50% → 100%) Estimated Implementation: 3 weeks (1 backend developer + 1 Python developer)