ChainAlign Data Catalog
Version: 1.0 Date: October 17, 2025 Status: In Progress
1. Introduction
This document serves as the master Data Catalog for the ChainAlign platform. It defines every required data field, its format, its business purpose, and which ChainAlign engine or service depends on it. This catalog is the single source of truth for all customer data onboarding and integration efforts.
2. Data Requirements by Service
2.1 Hybrid Forecasting Service (M26)
The following data entities and fields are required for the full functioning of the Hybrid Forecasting Service, including SKU segmentation, edge-case detection, and location-aware forecasting.
Entity: Sales / Demand History
- Purpose: Core input for all statistical models and trend analysis.
- Source Table(s):
actuals,sales_history - Required Fields:
sku_id: Product identifier.location_id: Geographic or site identifier.period: The date/timestamp of the sales record.demand_actual(orquantity_sold): The number of units sold.last_updated: Timestamp of when the record was last updated.
Entity: Location & Capacity
- Purpose: Provides location-specific context for capacity constraints and local demand patterns.
- Source Table(s):
locations - Required Fields:
location_id: Unique location identifier.location_name: Human-readable name.location_type: e.g., 'Store', 'Warehouse', 'Plant'.size_category: e.g., 'Large', 'Medium', 'Small'.geographic_scope: e.g., city, state, country.receiving_capacity: Max units that can be received per day.storage_capacity: Max units that can be stored.current_utilization_pct: Current % of capacity being used.annual_budget: Financial constraints for the location.inventory_policies: JSONB or text describing min/max inventory rules.
Entity: Promotions
- Purpose: Used to model and understand event-driven demand spikes.
- Source Table(s):
location_promotions - Required Fields:
location_id: The location where the promotion is active.sku_id: The product being promoted.promotion_name: Name of the campaign.start_date&end_date: Duration of the promotion.expected_lift_factor: The anticipated % increase in sales.
Entity: Supply & Production
- Purpose: Provides data for supply-side constraint analysis.
- Source Table(s):
production,inventory - Required Fields:
production_quantity: Number of units produced in a period.on_hand_quantity: Current physical inventory.
2.2 RAG & Cognee Services
The following data entities are required for the knowledge graph construction, semantic search, and evaluation capabilities of the RAG and Cognee services.
Entity: Knowledge Base Documents
- Purpose: The core unstructured text data that forms the knowledge base for the entire RAG system.
- Source Table(s):
documents,document_chunks - Required Fields:
document_id: Unique identifier for the source document.tenant_id: CRITICAL for data isolation and security.contentorchunk_text: The raw text content of the document.metadata: Flexible JSONB field to store source information (e.g., filename, URL).embedding: The vector representation of the text used for semantic search.
Entity: Structured Financials (for Cognify)
- Purpose: Structured financial data that is converted into natural language sentences to be added to the knowledge graph.
- Source: Can come from any structured source (e.g., financial tables, ERPs).
- Required Fields:
company: The name of the company.year: The fiscal year of the report.revenue: Total revenue.net_income: Total net income.
Entity: RAG Evaluation Dataset
- Purpose: A curated dataset used by the Ragas service to evaluate the performance and accuracy of the RAG pipeline.
- Source:
docs/testing/rag_eval_dataset.json - Required Fields (per item):
question: A sample question to test the system.ground_truth_answer: The ideal, factually correct answer.relevant_contexts: An array of text snippets that contain the necessary information to answer the question.
Entity: RAG Feedback
- Purpose: Captures user feedback on the quality and relevance of RAG search results.
- Source Table(s):
rag_feedback - Required Fields:
retrieval_session_id: Links the feedback to a specific search query.user_id&tenant_id: For tracking and context.feedback_type: e.g., 'relevance_check', 'missing_knowledge'.explanation: Free-text from the user explaining their feedback.
2.3 Constraint Intelligence Engine
The following data entities are required for the Constraint Intelligence Engine to validate S&OP plans and quantify their financial and operational impacts.
Entity: Constraints Master
- Purpose: Defines the business rules, physical limits, and financial targets that govern the S&OP plan.
- Source Table(s):
constraints - Required Fields:
constraint_id: Unique identifier for the rule.tenant_id: For multi-tenancy.name: A unique name mapping to an evaluator function (e.g.,MAX_PRODUCTION_CAPACITY).type: The category of constraint ('Hard', 'Soft', 'Dynamic').definition: A JSONB object containing the specific parameters for the rule (e.g.,{"targetField": "production_actual", "threshold": 10000}).is_active: A boolean to enable or disable the constraint.
Entity: S&OP Plan Data
- Purpose: The time-series data representing the plan to be validated. The constraint engine dynamically accesses fields from this entity based on the constraint definitions.
- Source Table(s):
sop_plan_data - Required Fields (Examples):
period: The time bucket for the data point.production_actual: To check against production capacity.inventory_actual: To check against inventory minimums.service_level_actual: To check against service level targets.revenue_actual: Used to calculate the financial impact of service level misses.stockout_probability: A calculated risk metric used to determine revenue at risk.demand_units&unit_price: Used for financial calculations.- Fields related to cost calculation, such as
required_hours,standard_capacity_hours,overtime_rate, etc.