Skip to main content

Cognee GraphRAG Integration (M72)

1. Executive Summary

FieldDescription
Feature NameCognee GraphRAG Integration (Complete)
MilestoneM72
Current Status50% → Target: 100%
Architectural GoalTransform ChainAlign's knowledge retrieval from simple vector search to intelligent Graph-based Retrieval-Augmented Generation (GraphRAG) with entity relationships, multi-hop reasoning, and structured context retrieval.
System ImpactPowers the Socratic Inquiry Engine (M70) with rich organizational context (Strategic North, Historical Failures, Constraints). Enables proactive decision support by surfacing relationships between decisions, constraints, and outcomes.
Target UsersAll decision-makers (indirect), AI systems (direct consumer)
MVP Success MetricSocratic questions generated with context retrieved from GraphRAG (3 context types). Knowledge graph successfully visualized with entities and relationships.

2. Strategic Context

FeatureTraditional Vector Search (Current 50%)GraphRAG (Target 100%)
Context RetrievalSemantic similarity only (embeddings)Semantic + Structural (relationships)
Multi-Hop Reasoning❌ Cannot traverse relationships✅ "Who does this supplier depend on?"
Entity Recognition❌ No entity extraction✅ Extracts companies, people, products, constraints
Temporal Reasoning❌ No time-aware queries✅ "What decisions did we make in Q3 2024 about PFAS?"
Relationship Mapping❌ No explicit links✅ "How is Constraint A related to Decision B?"
Proactive Insights❌ Reactive only✅ "Similar decisions made in 2022 led to X outcome"

GraphRAG Use Cases in ChainAlign:

  1. Socratic Inquiry Engine: Retrieve board memos, incident reports, capacity constraints with metadata
  2. Decision History Feed: "Show me decisions that referenced Supplier X"
  3. Constraint Intelligence: "What products are affected by this new regulation?"
  4. Proactive Alerting: "Your current decision context matches a past failure scenario"

3. Current State (50% Complete)

What's Already Built:

✅ Cognee service running at http://localhost:5003 (port 8004 in Docker) ✅ Basic endpoints: /cognee/add, /cognee/cognify, /cognee/search ✅ RAGService integration (retrieveRelevantChunks() calls Cognee) ✅ PostgreSQL backend configured for graph and vector storage

What's Missing (50% Gap):

Structured Context Retrieval: No support for filtering by document type, recency, metadata ❌ Entity Extraction: Not extracting companies, products, constraints as nodes ❌ Relationship Mapping: No explicit edges between entities (e.g., "Product A → Uses → Supplier B") ❌ Knowledge Graph Visualization: No endpoint to export graph structure ❌ Multi-Hop Queries: Cannot traverse relationships ("Find all decisions that depend on Supplier X") ❌ Metadata-Driven Search: No way to query "board memos from last 6 months" ❌ Integration with SIE: No specialized endpoints for Strategic North, Historical Failures, Constraints


4. Functional Requirements

FR-1: Enhanced Document Ingestion with Metadata

IDRequirementDetails
FR-1.1Document Type TaggingAll documents must be tagged with type: board_memo, incident_report, capacity_report, budget_snapshot, supplier_contract, regulatory_filing, decision_record
FR-1.2Temporal MetadataTrack document created_date, effective_date, expiry_date
FR-1.3Entity ExtractionAutomatically extract entities: company, product, constraint, supplier, customer, regulatory_body
FR-1.4Relationship InferenceInfer relationships: affects, depends_on, mitigates, violates, supersedes
FR-1.5Batch Ingestion APISupport bulk upload of documents with metadata (e.g., import 100 board memos with dates)

Example Ingestion Payload:

{
"documents": [
{
"content": "Board Memo Q3 2024: Priority #1 is 95% on-time delivery...",
"metadata": {
"document_type": "board_memo",
"created_date": "2024-10-01",
"effective_date": "2024-10-01",
"source": "Board Meeting Minutes",
"entities": ["95% on-time delivery", "40% gross margin floor"],
"tags": ["strategic_priority", "Q3_2024"]
}
}
]
}

FR-2: Structured Context Retrieval

IDRequirementDetails
FR-2.1Filter by Document TypeEndpoint: POST /cognee/search-structured
Filters: document_type (array), recency (e.g., "last_6_months"), tags (array)
FR-2.2Strategic North RetrievalSpecialized query: "Get board memos and strategic priorities from last 6 months"
Returns: Top 3 most relevant strategic documents with snippets
FR-2.3Historical Failures RetrievalSpecialized query: "Get incident reports related to {decision_type}"
Returns: Past incidents with severity, lessons learned
FR-2.4Constraints RetrievalSpecialized query: "Get current capacity and budget constraints for {decision_scope}"
Returns: Real-time constraint data with utilization percentages
FR-2.5Temporal FilteringSupport time ranges: "last_30_days", "last_6_months", "Q3_2024", "2024"

Example Structured Search Request:

{
"query": "PFAS regulation compliance",
"filters": {
"document_type": ["incident_report", "regulatory_filing"],
"recency": "last_6_months",
"severity": "high"
},
"limit": 3
}

Example Response:

{
"status": "success",
"results": [
{
"content": "Incident Report 2024-IR-045: PFAS supplier qualification delay...",
"source": "Incident Report 2024-IR-045",
"document_type": "incident_report",
"created_date": "2024-08-15",
"severity": "high",
"relevance_score": 0.92,
"entities": ["PFAS", "Metco", "supplier qualification"],
"relationships": [
{"type": "affects", "target": "Product Line A"},
{"type": "related_to", "target": "Decision CHA-123"}
]
}
]
}

FR-3: Knowledge Graph Visualization

IDRequirementDetails
FR-3.1Export Graph StructureEndpoint: GET /cognee/graph/export
Returns: Nodes and edges in JSON format
FR-3.2Node TypesNodes represent: Document, Entity, Constraint, Decision, Supplier, Product
FR-3.3Edge TypesEdges represent: affects, depends_on, mitigates, violates, references
FR-3.4Subgraph ExtractionEndpoint: POST /cognee/graph/subgraph
Input: Entity ID or Decision ID
Returns: All nodes and edges within N hops
FR-3.5Graph MetricsReturn: Node count, edge count, average degree, clustering coefficient

Example Graph Export:

{
"nodes": [
{"id": "doc-board-memo-q3-2024", "type": "Document", "label": "Board Memo Q3 2024", "metadata": {"document_type": "board_memo"}},
{"id": "entity-95-otd", "type": "Strategic_Priority", "label": "95% On-Time Delivery", "metadata": {}},
{"id": "constraint-pfas-ban", "type": "Regulatory_Constraint", "label": "PFAS Ban 2025", "metadata": {"severity": "high"}}
],
"edges": [
{"source": "doc-board-memo-q3-2024", "target": "entity-95-otd", "type": "defines"},
{"source": "constraint-pfas-ban", "target": "entity-95-otd", "type": "threatens"}
]
}

FR-4: Multi-Hop Reasoning

IDRequirementDetails
FR-4.1Traversal QueriesEndpoint: POST /cognee/graph/traverse
Input: Start node, relationship types, max hops
Returns: All reachable nodes and paths
FR-4.2Dependency AnalysisQuery: "Find all products that depend on Supplier X"
Traverse: Product → depends_on → Supplier
FR-4.3Impact AnalysisQuery: "What decisions are affected by Constraint Y?"
Traverse: Constraint → affects → Decision
FR-4.4Historical Pattern MatchingQuery: "Find past decisions with similar context to current decision"
Uses: Graph similarity + semantic similarity

Example Traversal Request:

{
"start_node": "supplier-metco",
"relationship_types": ["supplies_to", "affects"],
"max_hops": 2,
"filter": {"node_type": "Product"}
}

Example Response:

{
"paths": [
{
"path": ["supplier-metco", "supplies_to", "product-line-a", "affects", "decision-cha-123"],
"length": 2,
"entities": ["Metco", "Product Line A", "Decision CHA-123"]
}
]
}

FR-5: Integration with Socratic Inquiry Engine

IDRequirementDetails
FR-5.1Context Retrieval for QGMSocraticInquiryService.generateQuestions() must call Cognee structured search
FR-5.2Strategic North QueryCall: POST /cognee/search-structured with filters: document_type=["board_memo"], recency="last_6_months"
FR-5.3Historical Failures QueryCall: POST /cognee/search-structured with filters: document_type=["incident_report"], decision_type={current_decision_type}
FR-5.4Constraints QueryCall: POST /cognee/search-structured with filters: document_type=["capacity_report", "budget_snapshot"], recency="last_30_days"
FR-5.5Context Snippet FormattingCognee must return snippets in format: {source, text, metadata, relevance_score}

5. Technical Architecture

5.1. Cognee Service Enhancements

New Endpoints

1. Structured Search (Enhanced)

@app.post("/cognee/search-structured")
async def cognee_search_structured(request: StructuredSearchRequest):
"""
Performs structured search with filters for document type, recency, tags.
"""
# Apply filters to GraphRAG query
# Return results with metadata

2. Graph Export

@app.get("/cognee/graph/export")
async def export_graph():
"""
Exports the full knowledge graph structure (nodes + edges).
"""
# Query Cognee's internal graph DB
# Return JSON representation

3. Subgraph Extraction

@app.post("/cognee/graph/subgraph")
async def get_subgraph(request: SubgraphRequest):
"""
Extracts a subgraph around a specific entity or decision.
"""
# BFS/DFS traversal from start_node
# Return nodes and edges within max_hops

4. Graph Traversal

@app.post("/cognee/graph/traverse")
async def traverse_graph(request: TraversalRequest):
"""
Performs multi-hop traversal following specified relationship types.
"""
# Cypher-style query execution
# Return paths and reachable nodes

5. Batch Metadata Ingestion

@app.post("/cognee/ingest-batch")
async def ingest_batch(request: BatchIngestionRequest):
"""
Bulk upload of documents with rich metadata (type, dates, entities, tags).
"""
# Validate metadata schema
# Extract entities and relationships
# Add to knowledge graph

5.2. Backend Service Integration

Enhance CogneeService.js (New Service or Extend RAGService)

// backend/src/services/CogneeService.js

class CogneeService {
constructor() {
this.cogneeUrl = process.env.COGNEE_SERVICE_URL || 'http://localhost:8004';
}

/**
* Structured search with filters for document type, recency, tags.
*/
async searchStructured({ query, filters, limit = 5 }) {
const response = await fetch(`${this.cogneeUrl}/cognee/search-structured`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ query, filters, limit })
});
if (!response.ok) throw new Error(`Cognee structured search failed: ${response.statusText}`);
return await response.json();
}

/**
* Retrieve Strategic North context (board memos, strategic priorities).
*/
async retrieveStrategicContext() {
return this.searchStructured({
query: "What are the current strategic priorities and board mandates?",
filters: {
document_type: ["board_memo"],
recency: "last_6_months"
},
limit: 3
});
}

/**
* Retrieve Historical Failures context (incident reports).
*/
async retrieveHistoricalFailures(decisionType) {
return this.searchStructured({
query: `Past incidents related to ${decisionType}`,
filters: {
document_type: ["incident_report"],
severity: "high"
},
limit: 2
});
}

/**
* Retrieve Constraints context (capacity, budget).
*/
async retrieveConstraints(decisionScope) {
return this.searchStructured({
query: `Current capacity utilization and budget constraints for ${decisionScope}`,
filters: {
document_type: ["capacity_report", "budget_snapshot"],
recency: "last_30_days"
},
limit: 2
});
}

/**
* Export full knowledge graph.
*/
async exportGraph() {
const response = await fetch(`${this.cogneeUrl}/cognee/graph/export`);
if (!response.ok) throw new Error(`Cognee graph export failed: ${response.statusText}`);
return await response.json();
}

/**
* Get subgraph around an entity or decision.
*/
async getSubgraph(startNode, maxHops = 2) {
const response = await fetch(`${this.cogneeUrl}/cognee/graph/subgraph`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ start_node: startNode, max_hops: maxHops })
});
if (!response.ok) throw new Error(`Cognee subgraph extraction failed: ${response.statusText}`);
return await response.json();
}

/**
* Multi-hop graph traversal.
*/
async traverseGraph({ startNode, relationshipTypes, maxHops, filter }) {
const response = await fetch(`${this.cogneeUrl}/cognee/graph/traverse`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ start_node: startNode, relationship_types: relationshipTypes, max_hops: maxHops, filter })
});
if (!response.ok) throw new Error(`Cognee graph traversal failed: ${response.statusText}`);
return await response.json();
}

/**
* Batch ingest documents with metadata.
*/
async ingestBatch(documents) {
const response = await fetch(`${this.cogneeUrl}/cognee/ingest-batch`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ documents })
});
if (!response.ok) throw new Error(`Cognee batch ingestion failed: ${response.statusText}`);
return await response.json();
}
}

export default new CogneeService();

5.3. Database Schema (PostgreSQL via Cognee)

Cognee uses PostgreSQL for both graph and vector storage. The schema is managed by Cognee internally, but we need to ensure:

Graph Tables:

  • nodes - Stores entities (documents, constraints, products, suppliers)
  • edges - Stores relationships (affects, depends_on, mitigates, etc.)

Vector Tables:

  • embeddings - Stores document embeddings for semantic search

Metadata Indexes:

  • Index on document_type, created_date, tags for fast filtering

6. Performance Requirements

MetricTargetRationale
Structured Search< 2 secondsContext retrieval for Socratic questions must be fast
Graph Export< 5 secondsFull graph export for visualization
Subgraph Extraction< 1 secondFast entity-centric exploration
Multi-Hop Traversal< 3 secondsComplex queries with 2-3 hops
Batch Ingestion100 documents in < 10 secondsBulk import of historical data

7. Security & Compliance

7.1. Tenant Isolation

Problem: Cognee service is shared across tenants. Must prevent data leakage.

Solution:

  • Add tenant_id to all document metadata during ingestion
  • Filter all searches by tenant_id at the service layer (backend)
  • Cognee service itself is tenant-agnostic; isolation enforced by backend

Example:

// In CogneeService.js
async searchStructured({ query, filters, limit, tenantId }) {
// Always add tenant_id filter
filters.tenant_id = tenantId;
// Send to Cognee...
}

7.2. Data Retention

  • Knowledge graph data retained for 7 years (regulatory compliance)
  • Entities and relationships never deleted, only soft-deleted (add deleted_at flag)

8. Testing Requirements

8.1. Unit Tests (80% coverage)

  • CogneeService.searchStructured() with various filters
  • CogneeService.retrieveStrategicContext() returns correct document types
  • CogneeService.traverseGraph() returns valid paths

8.2. Integration Tests

  • E2E: Ingest document → Cognify → Structured search → Verify results
  • E2E: Create decision → SIE generates questions → Cognee retrieves context → Questions contain context snippets
  • Graph traversal: "Find all products depending on Supplier X"

8.3. Performance Tests

  • Load test: 1,000 documents ingested, structured search returns in < 2s
  • Stress test: 50 concurrent structured searches

9. Migration Strategy

9.1. Backfill Historical Data

Phase 1: Metadata Enrichment (Week 1)

  • Export existing documents and data_elements tables
  • Manually tag with document_type, created_date, tags
  • Batch ingest into Cognee via /cognee/ingest-batch

Phase 2: Entity Extraction (Week 2)

  • Run Cognee's entity extraction on backfilled documents
  • Verify entities extracted correctly (companies, products, constraints)

Phase 3: Relationship Inference (Week 3)

  • Use LLM to infer relationships between entities
  • Example: "Board Memo mentions 95% OTD → defines → Strategic Priority"

9.2. Dual-Mode Operation

  • Legacy Mode: RAGService continues using basic vector search (retrieveRelevantChunks())
  • GraphRAG Mode: New code uses CogneeService.searchStructured()
  • Feature flag: ENABLE_GRAPHRAG (default: false)

9.3. Rollout Plan

Week 1: Deploy enhanced Cognee service with new endpoints Week 2: Backfill historical data (100 documents) Week 3: Enable GraphRAG for SIE (Socratic questions only) Week 4: Enable GraphRAG for all RAG use cases (replace legacy mode)


10. Success Criteria

M72 is complete when: ✅ Structured search endpoint functional with filters (document_type, recency, tags) ✅ Strategic North, Historical Failures, Constraints retrieval methods working ✅ Knowledge graph export returns nodes and edges in JSON format ✅ Multi-hop traversal (2-3 hops) functional ✅ Socratic Inquiry Engine (M70) successfully retrieves context from Cognee ✅ At least 100 historical documents ingested and cognified ✅ Unit test coverage ≥ 80% ✅ Integration tests passing ✅ Performance targets met (< 2s structured search)


11. Dependencies

DependencyStatusRequired For
Cognee Service (Basic)✅ 50% CompleteFoundation for enhancements
PostgreSQL with pgvector✅ OperationalGraph and vector storage
M70 (Socratic Inquiry Engine)🔄 In ProgressContext consumer
Document Ingestion Pipeline✅ OperationalSource of documents to cognify

12. Future Enhancements (Post-M72)

Phase 2 Features:

  • Graph Visualization UI: Interactive graph explorer in frontend
  • Automated Relationship Inference: LLM-powered edge generation
  • Temporal Graph Queries: "How did our strategic priorities change over time?"
  • Probabilistic Reasoning: "What's the likelihood this decision leads to outcome X based on historical patterns?"
  • Cross-Tenant Graph Insights: Anonymized pattern sharing across tenants (with consent)

Document Status: Complete FSD Last Updated: 2025-11-14 Milestone: M72 - Cognee GraphRAG Integration (50% → 100%) Estimated Implementation: 3 weeks (1 backend developer + 1 Python developer)