Cognee GraphRAG Integration (M72)

1. Executive Summary

Field	Description
Feature Name	Cognee GraphRAG Integration (Complete)
Milestone	M72
Current Status	50% → Target: 100%
Architectural Goal	Transform ChainAlign's knowledge retrieval from simple vector search to intelligent Graph-based Retrieval-Augmented Generation (GraphRAG) with entity relationships, multi-hop reasoning, and structured context retrieval.
System Impact	Powers the Socratic Inquiry Engine (M70) with rich organizational context (Strategic North, Historical Failures, Constraints). Enables proactive decision support by surfacing relationships between decisions, constraints, and outcomes.
Target Users	All decision-makers (indirect), AI systems (direct consumer)
MVP Success Metric	Socratic questions generated with context retrieved from GraphRAG (3 context types). Knowledge graph successfully visualized with entities and relationships.

2. Strategic Context

Why GraphRAG vs. Traditional Vector Search?

Feature	Traditional Vector Search (Current 50%)	GraphRAG (Target 100%)
Context Retrieval	Semantic similarity only (embeddings)	Semantic + Structural (relationships)
Multi-Hop Reasoning	❌ Cannot traverse relationships	✅ "Who does this supplier depend on?"
Entity Recognition	❌ No entity extraction	✅ Extracts companies, people, products, constraints
Temporal Reasoning	❌ No time-aware queries	✅ "What decisions did we make in Q3 2024 about PFAS?"
Relationship Mapping	❌ No explicit links	✅ "How is Constraint A related to Decision B?"
Proactive Insights	❌ Reactive only	✅ "Similar decisions made in 2022 led to X outcome"

GraphRAG Use Cases in ChainAlign:

Socratic Inquiry Engine: Retrieve board memos, incident reports, capacity constraints with metadata
Decision History Feed: "Show me decisions that referenced Supplier X"
Constraint Intelligence: "What products are affected by this new regulation?"
Proactive Alerting: "Your current decision context matches a past failure scenario"

3. Current State (50% Complete)

What's Already Built:

✅ Cognee service running at http://localhost:5003 (port 8004 in Docker) ✅ Basic endpoints: /cognee/add, /cognee/cognify, /cognee/search ✅ RAGService integration (retrieveRelevantChunks() calls Cognee) ✅ PostgreSQL backend configured for graph and vector storage

What's Missing (50% Gap):

❌ Structured Context Retrieval: No support for filtering by document type, recency, metadata ❌ Entity Extraction: Not extracting companies, products, constraints as nodes ❌ Relationship Mapping: No explicit edges between entities (e.g., "Product A → Uses → Supplier B") ❌ Knowledge Graph Visualization: No endpoint to export graph structure ❌ Multi-Hop Queries: Cannot traverse relationships ("Find all decisions that depend on Supplier X") ❌ Metadata-Driven Search: No way to query "board memos from last 6 months" ❌ Integration with SIE: No specialized endpoints for Strategic North, Historical Failures, Constraints

4. Functional Requirements

FR-1: Enhanced Document Ingestion with Metadata

ID	Requirement	Details
FR-1.1	Document Type Tagging	All documents must be tagged with type: `board_memo`, `incident_report`, `capacity_report`, `budget_snapshot`, `supplier_contract`, `regulatory_filing`, `decision_record`
FR-1.2	Temporal Metadata	Track document `created_date`, `effective_date`, `expiry_date`
FR-1.3	Entity Extraction	Automatically extract entities: `company`, `product`, `constraint`, `supplier`, `customer`, `regulatory_body`
FR-1.4	Relationship Inference	Infer relationships: `affects`, `depends_on`, `mitigates`, `violates`, `supersedes`
FR-1.5	Batch Ingestion API	Support bulk upload of documents with metadata (e.g., import 100 board memos with dates)

Example Ingestion Payload:

{
  "documents": [
    {
      "content": "Board Memo Q3 2024: Priority #1 is 95% on-time delivery...",
      "metadata": {
        "document_type": "board_memo",
        "created_date": "2024-10-01",
        "effective_date": "2024-10-01",
        "source": "Board Meeting Minutes",
        "entities": ["95% on-time delivery", "40% gross margin floor"],
        "tags": ["strategic_priority", "Q3_2024"]
      }
    }
  ]
}

FR-2: Structured Context Retrieval

ID	Requirement	Details
FR-2.1	Filter by Document Type	Endpoint: `POST /cognee/search-structured` Filters: `document_type` (array), `recency` (e.g., "last_6_months"), `tags` (array)
FR-2.2	Strategic North Retrieval	Specialized query: "Get board memos and strategic priorities from last 6 months" Returns: Top 3 most relevant strategic documents with snippets
FR-2.3	Historical Failures Retrieval	Specialized query: "Get incident reports related to {decision_type}" Returns: Past incidents with severity, lessons learned
FR-2.4	Constraints Retrieval	Specialized query: "Get current capacity and budget constraints for {decision_scope}" Returns: Real-time constraint data with utilization percentages
FR-2.5	Temporal Filtering	Support time ranges: "last_30_days", "last_6_months", "Q3_2024", "2024"

Example Structured Search Request:

{
  "query": "PFAS regulation compliance",
  "filters": {
    "document_type": ["incident_report", "regulatory_filing"],
    "recency": "last_6_months",
    "severity": "high"
  },
  "limit": 3
}

Example Response:

{
  "status": "success",
  "results": [
    {
      "content": "Incident Report 2024-IR-045: PFAS supplier qualification delay...",
      "source": "Incident Report 2024-IR-045",
      "document_type": "incident_report",
      "created_date": "2024-08-15",
      "severity": "high",
      "relevance_score": 0.92,
      "entities": ["PFAS", "Metco", "supplier qualification"],
      "relationships": [
        {"type": "affects", "target": "Product Line A"},
        {"type": "related_to", "target": "Decision CHA-123"}
      ]
    }
  ]
}

FR-3: Knowledge Graph Visualization

ID	Requirement	Details
FR-3.1	Export Graph Structure	Endpoint: `GET /cognee/graph/export` Returns: Nodes and edges in JSON format
FR-3.2	Node Types	Nodes represent: `Document`, `Entity`, `Constraint`, `Decision`, `Supplier`, `Product`
FR-3.3	Edge Types	Edges represent: `affects`, `depends_on`, `mitigates`, `violates`, `references`
FR-3.4	Subgraph Extraction	Endpoint: `POST /cognee/graph/subgraph` Input: Entity ID or Decision ID Returns: All nodes and edges within N hops
FR-3.5	Graph Metrics	Return: Node count, edge count, average degree, clustering coefficient

Example Graph Export:

{
  "nodes": [
    {"id": "doc-board-memo-q3-2024", "type": "Document", "label": "Board Memo Q3 2024", "metadata": {"document_type": "board_memo"}},
    {"id": "entity-95-otd", "type": "Strategic_Priority", "label": "95% On-Time Delivery", "metadata": {}},
    {"id": "constraint-pfas-ban", "type": "Regulatory_Constraint", "label": "PFAS Ban 2025", "metadata": {"severity": "high"}}
  ],
  "edges": [
    {"source": "doc-board-memo-q3-2024", "target": "entity-95-otd", "type": "defines"},
    {"source": "constraint-pfas-ban", "target": "entity-95-otd", "type": "threatens"}
  ]
}

FR-4: Multi-Hop Reasoning

ID	Requirement	Details
FR-4.1	Traversal Queries	Endpoint: `POST /cognee/graph/traverse` Input: Start node, relationship types, max hops Returns: All reachable nodes and paths
FR-4.2	Dependency Analysis	Query: "Find all products that depend on Supplier X" Traverse: `Product → depends_on → Supplier`
FR-4.3	Impact Analysis	Query: "What decisions are affected by Constraint Y?" Traverse: `Constraint → affects → Decision`
FR-4.4	Historical Pattern Matching	Query: "Find past decisions with similar context to current decision" Uses: Graph similarity + semantic similarity

Example Traversal Request:

{
  "start_node": "supplier-metco",
  "relationship_types": ["supplies_to", "affects"],
  "max_hops": 2,
  "filter": {"node_type": "Product"}
}

Example Response:

{
  "paths": [
    {
      "path": ["supplier-metco", "supplies_to", "product-line-a", "affects", "decision-cha-123"],
      "length": 2,
      "entities": ["Metco", "Product Line A", "Decision CHA-123"]
    }
  ]
}

FR-5: Integration with Socratic Inquiry Engine

ID	Requirement	Details
FR-5.1	Context Retrieval for QGM	`SocraticInquiryService.generateQuestions()` must call Cognee structured search
FR-5.2	Strategic North Query	Call: `POST /cognee/search-structured` with filters: `document_type=["board_memo"], recency="last_6_months"`
FR-5.3	Historical Failures Query	Call: `POST /cognee/search-structured` with filters: `document_type=["incident_report"], decision_type={current_decision_type}`
FR-5.4	Constraints Query	Call: `POST /cognee/search-structured` with filters: `document_type=["capacity_report", "budget_snapshot"], recency="last_30_days"`
FR-5.5	Context Snippet Formatting	Cognee must return snippets in format: `{source, text, metadata, relevance_score}`

5. Technical Architecture

5.1. Cognee Service Enhancements

New Endpoints

1. Structured Search (Enhanced)

@app.post("/cognee/search-structured")
async def cognee_search_structured(request: StructuredSearchRequest):
    """
    Performs structured search with filters for document type, recency, tags.
    """
    # Apply filters to GraphRAG query
    # Return results with metadata

2. Graph Export

@app.get("/cognee/graph/export")
async def export_graph():
    """
    Exports the full knowledge graph structure (nodes + edges).
    """
    # Query Cognee's internal graph DB
    # Return JSON representation

3. Subgraph Extraction

@app.post("/cognee/graph/subgraph")
async def get_subgraph(request: SubgraphRequest):
    """
    Extracts a subgraph around a specific entity or decision.
    """
    # BFS/DFS traversal from start_node
    # Return nodes and edges within max_hops

4. Graph Traversal

@app.post("/cognee/graph/traverse")
async def traverse_graph(request: TraversalRequest):
    """
    Performs multi-hop traversal following specified relationship types.
    """
    # Cypher-style query execution
    # Return paths and reachable nodes

5. Batch Metadata Ingestion

@app.post("/cognee/ingest-batch")
async def ingest_batch(request: BatchIngestionRequest):
    """
    Bulk upload of documents with rich metadata (type, dates, entities, tags).
    """
    # Validate metadata schema
    # Extract entities and relationships
    # Add to knowledge graph

5.2. Backend Service Integration

Enhance CogneeService.js (New Service or Extend RAGService)

// backend/src/services/CogneeService.js

class CogneeService {
  constructor() {
    this.cogneeUrl = process.env.COGNEE_SERVICE_URL || 'http://localhost:8004';
  }

  /**
   * Structured search with filters for document type, recency, tags.
   */
  async searchStructured({ query, filters, limit = 5 }) {
    const response = await fetch(`${this.cogneeUrl}/cognee/search-structured`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ query, filters, limit })
    });
    if (!response.ok) throw new Error(`Cognee structured search failed: ${response.statusText}`);
    return await response.json();
  }

  /**
   * Retrieve Strategic North context (board memos, strategic priorities).
   */
  async retrieveStrategicContext() {
    return this.searchStructured({
      query: "What are the current strategic priorities and board mandates?",
      filters: {
        document_type: ["board_memo"],
        recency: "last_6_months"
      },
      limit: 3
    });
  }

  /**
   * Retrieve Historical Failures context (incident reports).
   */
  async retrieveHistoricalFailures(decisionType) {
    return this.searchStructured({
      query: `Past incidents related to ${decisionType}`,
      filters: {
        document_type: ["incident_report"],
        severity: "high"
      },
      limit: 2
    });
  }

  /**
   * Retrieve Constraints context (capacity, budget).
   */
  async retrieveConstraints(decisionScope) {
    return this.searchStructured({
      query: `Current capacity utilization and budget constraints for ${decisionScope}`,
      filters: {
        document_type: ["capacity_report", "budget_snapshot"],
        recency: "last_30_days"
      },
      limit: 2
    });
  }

  /**
   * Export full knowledge graph.
   */
  async exportGraph() {
    const response = await fetch(`${this.cogneeUrl}/cognee/graph/export`);
    if (!response.ok) throw new Error(`Cognee graph export failed: ${response.statusText}`);
    return await response.json();
  }

  /**
   * Get subgraph around an entity or decision.
   */
  async getSubgraph(startNode, maxHops = 2) {
    const response = await fetch(`${this.cogneeUrl}/cognee/graph/subgraph`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ start_node: startNode, max_hops: maxHops })
    });
    if (!response.ok) throw new Error(`Cognee subgraph extraction failed: ${response.statusText}`);
    return await response.json();
  }

  /**
   * Multi-hop graph traversal.
   */
  async traverseGraph({ startNode, relationshipTypes, maxHops, filter }) {
    const response = await fetch(`${this.cogneeUrl}/cognee/graph/traverse`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ start_node: startNode, relationship_types: relationshipTypes, max_hops: maxHops, filter })
    });
    if (!response.ok) throw new Error(`Cognee graph traversal failed: ${response.statusText}`);
    return await response.json();
  }

  /**
   * Batch ingest documents with metadata.
   */
  async ingestBatch(documents) {
    const response = await fetch(`${this.cogneeUrl}/cognee/ingest-batch`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ documents })
    });
    if (!response.ok) throw new Error(`Cognee batch ingestion failed: ${response.statusText}`);
    return await response.json();
  }
}

export default new CogneeService();

5.3. Database Schema (PostgreSQL via Cognee)

Cognee uses PostgreSQL for both graph and vector storage. The schema is managed by Cognee internally, but we need to ensure:

Graph Tables:

nodes - Stores entities (documents, constraints, products, suppliers)
edges - Stores relationships (affects, depends_on, mitigates, etc.)

Vector Tables:

embeddings - Stores document embeddings for semantic search

Metadata Indexes:

Index on document_type, created_date, tags for fast filtering

6. Performance Requirements

Metric	Target	Rationale
Structured Search	< 2 seconds	Context retrieval for Socratic questions must be fast
Graph Export	< 5 seconds	Full graph export for visualization
Subgraph Extraction	< 1 second	Fast entity-centric exploration
Multi-Hop Traversal	< 3 seconds	Complex queries with 2-3 hops
Batch Ingestion	100 documents in < 10 seconds	Bulk import of historical data

7. Security & Compliance

7.1. Tenant Isolation

Problem: Cognee service is shared across tenants. Must prevent data leakage.

Solution:

Add tenant_id to all document metadata during ingestion
Filter all searches by tenant_id at the service layer (backend)
Cognee service itself is tenant-agnostic; isolation enforced by backend

Example:

// In CogneeService.js
async searchStructured({ query, filters, limit, tenantId }) {
  // Always add tenant_id filter
  filters.tenant_id = tenantId;
  // Send to Cognee...
}

7.2. Data Retention

Knowledge graph data retained for 7 years (regulatory compliance)
Entities and relationships never deleted, only soft-deleted (add deleted_at flag)

8. Testing Requirements

8.1. Unit Tests (80% coverage)

CogneeService.searchStructured() with various filters
CogneeService.retrieveStrategicContext() returns correct document types
CogneeService.traverseGraph() returns valid paths

8.2. Integration Tests

E2E: Ingest document → Cognify → Structured search → Verify results
E2E: Create decision → SIE generates questions → Cognee retrieves context → Questions contain context snippets
Graph traversal: "Find all products depending on Supplier X"

8.3. Performance Tests

Load test: 1,000 documents ingested, structured search returns in < 2s
Stress test: 50 concurrent structured searches

9. Migration Strategy

9.1. Backfill Historical Data

Phase 1: Metadata Enrichment (Week 1)

Export existing documents and data_elements tables
Manually tag with document_type, created_date, tags
Batch ingest into Cognee via /cognee/ingest-batch

Phase 2: Entity Extraction (Week 2)

Run Cognee's entity extraction on backfilled documents
Verify entities extracted correctly (companies, products, constraints)

Phase 3: Relationship Inference (Week 3)

Use LLM to infer relationships between entities
Example: "Board Memo mentions 95% OTD → defines → Strategic Priority"

9.2. Dual-Mode Operation

Legacy Mode: RAGService continues using basic vector search (retrieveRelevantChunks())
GraphRAG Mode: New code uses CogneeService.searchStructured()
Feature flag: ENABLE_GRAPHRAG (default: false)

9.3. Rollout Plan

Week 1: Deploy enhanced Cognee service with new endpoints Week 2: Backfill historical data (100 documents) Week 3: Enable GraphRAG for SIE (Socratic questions only) Week 4: Enable GraphRAG for all RAG use cases (replace legacy mode)

10. Success Criteria

M72 is complete when: ✅ Structured search endpoint functional with filters (document_type, recency, tags) ✅ Strategic North, Historical Failures, Constraints retrieval methods working ✅ Knowledge graph export returns nodes and edges in JSON format ✅ Multi-hop traversal (2-3 hops) functional ✅ Socratic Inquiry Engine (M70) successfully retrieves context from Cognee ✅ At least 100 historical documents ingested and cognified ✅ Unit test coverage ≥ 80% ✅ Integration tests passing ✅ Performance targets met (< 2s structured search)

11. Dependencies

Dependency	Status	Required For
Cognee Service (Basic)	✅ 50% Complete	Foundation for enhancements
PostgreSQL with pgvector	✅ Operational	Graph and vector storage
M70 (Socratic Inquiry Engine)	🔄 In Progress	Context consumer
Document Ingestion Pipeline	✅ Operational	Source of documents to cognify

12. Future Enhancements (Post-M72)

Phase 2 Features:

Graph Visualization UI: Interactive graph explorer in frontend
Automated Relationship Inference: LLM-powered edge generation
Temporal Graph Queries: "How did our strategic priorities change over time?"
Probabilistic Reasoning: "What's the likelihood this decision leads to outcome X based on historical patterns?"
Cross-Tenant Graph Insights: Anonymized pattern sharing across tenants (with consent)

Document Status: Complete FSD Last Updated: 2025-11-14 Milestone: M72 - Cognee GraphRAG Integration (50% → 100%) Estimated Implementation: 3 weeks (1 backend developer + 1 Python developer)

1. Executive Summary​

2. Strategic Context​

Why GraphRAG vs. Traditional Vector Search?​

3. Current State (50% Complete)​

What's Already Built:​

What's Missing (50% Gap):​

4. Functional Requirements​

FR-1: Enhanced Document Ingestion with Metadata​

FR-2: Structured Context Retrieval​

FR-3: Knowledge Graph Visualization​

FR-4: Multi-Hop Reasoning​

FR-5: Integration with Socratic Inquiry Engine​

5. Technical Architecture​

5.1. Cognee Service Enhancements​

New Endpoints​

5.2. Backend Service Integration​

Enhance CogneeService.js (New Service or Extend RAGService)​

5.3. Database Schema (PostgreSQL via Cognee)​

6. Performance Requirements​

7. Security & Compliance​

7.1. Tenant Isolation​

7.2. Data Retention​

8. Testing Requirements​

8.1. Unit Tests (80% coverage)​

8.2. Integration Tests​

8.3. Performance Tests​

9. Migration Strategy​

9.1. Backfill Historical Data​

9.2. Dual-Mode Operation​

9.3. Rollout Plan​

10. Success Criteria​

11. Dependencies​

12. Future Enhancements (Post-M72)​

Phase 2 Features:​