ChainAlign Data Catalog

Version: 1.0 Date: October 17, 2025 Status: In Progress

1. Introduction

This document serves as the master Data Catalog for the ChainAlign platform. It defines every required data field, its format, its business purpose, and which ChainAlign engine or service depends on it. This catalog is the single source of truth for all customer data onboarding and integration efforts.

2. Data Requirements by Service

2.1 Hybrid Forecasting Service (M26)

The following data entities and fields are required for the full functioning of the Hybrid Forecasting Service, including SKU segmentation, edge-case detection, and location-aware forecasting.

Entity: Sales / Demand History

Purpose: Core input for all statistical models and trend analysis.
Source Table(s): actuals, sales_history
Required Fields:
- sku_id: Product identifier.
- location_id: Geographic or site identifier.
- period: The date/timestamp of the sales record.
- demand_actual (or quantity_sold): The number of units sold.
- last_updated: Timestamp of when the record was last updated.

Entity: Location & Capacity

Purpose: Provides location-specific context for capacity constraints and local demand patterns.
Source Table(s): locations
Required Fields:
- location_id: Unique location identifier.
- location_name: Human-readable name.
- location_type: e.g., 'Store', 'Warehouse', 'Plant'.
- size_category: e.g., 'Large', 'Medium', 'Small'.
- geographic_scope: e.g., city, state, country.
- receiving_capacity: Max units that can be received per day.
- storage_capacity: Max units that can be stored.
- current_utilization_pct: Current % of capacity being used.
- annual_budget: Financial constraints for the location.
- inventory_policies: JSONB or text describing min/max inventory rules.

Entity: Promotions

Purpose: Used to model and understand event-driven demand spikes.
Source Table(s): location_promotions
Required Fields:
- location_id: The location where the promotion is active.
- sku_id: The product being promoted.
- promotion_name: Name of the campaign.
- start_date & end_date: Duration of the promotion.
- expected_lift_factor: The anticipated % increase in sales.

Entity: Supply & Production

Purpose: Provides data for supply-side constraint analysis.
Source Table(s): production, inventory
Required Fields:
- production_quantity: Number of units produced in a period.
- on_hand_quantity: Current physical inventory.

2.2 RAG & Cognee Services

The following data entities are required for the knowledge graph construction, semantic search, and evaluation capabilities of the RAG and Cognee services.

Entity: Knowledge Base Documents

Purpose: The core unstructured text data that forms the knowledge base for the entire RAG system.
Source Table(s): documents, document_chunks
Required Fields:
- document_id: Unique identifier for the source document.
- tenant_id: CRITICAL for data isolation and security.
- content or chunk_text: The raw text content of the document.
- metadata: Flexible JSONB field to store source information (e.g., filename, URL).
- embedding: The vector representation of the text used for semantic search.

Entity: Structured Financials (for Cognify)

Purpose: Structured financial data that is converted into natural language sentences to be added to the knowledge graph.
Source: Can come from any structured source (e.g., financial tables, ERPs).
Required Fields:
- company: The name of the company.
- year: The fiscal year of the report.
- revenue: Total revenue.
- net_income: Total net income.

Entity: RAG Evaluation Dataset

Purpose: A curated dataset used by the Ragas service to evaluate the performance and accuracy of the RAG pipeline.
Source: docs/testing/rag_eval_dataset.json
Required Fields (per item):
- question: A sample question to test the system.
- ground_truth_answer: The ideal, factually correct answer.
- relevant_contexts: An array of text snippets that contain the necessary information to answer the question.

Entity: RAG Feedback

Purpose: Captures user feedback on the quality and relevance of RAG search results.
Source Table(s): rag_feedback
Required Fields:
- retrieval_session_id: Links the feedback to a specific search query.
- user_id & tenant_id: For tracking and context.
- feedback_type: e.g., 'relevance_check', 'missing_knowledge'.
- explanation: Free-text from the user explaining their feedback.

2.3 Constraint Intelligence Engine

The following data entities are required for the Constraint Intelligence Engine to validate S&OP plans and quantify their financial and operational impacts.

Entity: Constraints Master

Purpose: Defines the business rules, physical limits, and financial targets that govern the S&OP plan.
Source Table(s): constraints
Required Fields:
- constraint_id: Unique identifier for the rule.
- tenant_id: For multi-tenancy.
- name: A unique name mapping to an evaluator function (e.g., MAX_PRODUCTION_CAPACITY).
- type: The category of constraint ('Hard', 'Soft', 'Dynamic').
- definition: A JSONB object containing the specific parameters for the rule (e.g., {"targetField": "production_actual", "threshold": 10000}).
- is_active: A boolean to enable or disable the constraint.

Entity: S&OP Plan Data

Purpose: The time-series data representing the plan to be validated. The constraint engine dynamically accesses fields from this entity based on the constraint definitions.
Source Table(s): sop_plan_data
Required Fields (Examples):
- period: The time bucket for the data point.
- production_actual: To check against production capacity.
- inventory_actual: To check against inventory minimums.
- service_level_actual: To check against service level targets.
- revenue_actual: Used to calculate the financial impact of service level misses.
- stockout_probability: A calculated risk metric used to determine revenue at risk.
- demand_units & unit_price: Used for financial calculations.
- Fields related to cost calculation, such as required_hours, standard_capacity_hours, overtime_rate, etc.

1. Introduction​

2. Data Requirements by Service​

2.1 Hybrid Forecasting Service (M26)​

Entity: Sales / Demand History​

Entity: Location & Capacity​

Entity: Promotions​

Entity: Supply & Production​

2.2 RAG & Cognee Services​

Entity: Knowledge Base Documents​

Entity: Structured Financials (for Cognify)​

Entity: RAG Evaluation Dataset​

Entity: RAG Feedback​

2.3 Constraint Intelligence Engine​

Entity: Constraints Master​

Entity: S&OP Plan Data​