ChainAlign - Solution Architecture Document
Version: 4.1 Date: October 15, 2025 Status: Revised
1. Executive Summary
This document outlines the technical architecture for ChainAlign, a decision intelligence platform designed to transform the S&OP cycle into a real-time, continuous intelligence process. The architecture is a modern, cloud-native, multi-tenant microservices system. Its core components are a sophisticated data ingestion pipeline, a hybrid GraphRAG engine for reasoning, a centralized AI Compliance Gateway for security, and a dynamic, real-time client.
2. Architectural Principles
- Cloud-Native: Leverage managed services for scalability, reliability, and reduced operational overhead.
- Microservices: Decompose the system into loosely coupled, independently deployable services.
- Event-Driven: Use asynchronous communication for responsiveness and resilience.
- Security by Design: Implement security controls at every layer, adhering to zero-trust principles.
- Governance by Design: Embed compliance, auditing, and data quality checks directly into the architecture.
- Multi-Tenancy: Ensure strict data isolation and resource management for all tenants.
- Modularity & Extensibility: Design for easy integration of new data sources, AI models, and features.
3. High-Level Architecture Diagram
4. Layered Architecture Breakdown
4.1. Client Layer
- Description: The user-facing application, built with React.js. It is a dynamic interface whose layout and components are often defined by JSON objects sent from the backend, enabling a highly adaptive and context-aware user experience.
- Key Components: Web Client (React.js), Dynamic Page Renderer.
4.2. Application Layer
- Description: The primary backend services that handle business logic, API routing, and real-time communication.
- Key Components:
- API Gateway (Node.js/Express): Securely exposes all backend functionalities to the client.
- Firebase Authentication: Manages user identities, multi-tenancy, and access tokens.
- WebSocket Service: For real-time push of insights, chart updates, and notifications.
4.3. AI Compliance Gateway (AI Firewall)
- Description: A mandatory security layer that intercepts all outgoing requests to external LLMs. This is a critical component for enterprise trust and compliance.
- Key Components:
- Redaction Engine: A Python microservice that removes PII, proprietary identifiers, and other sensitive data from prompts before they are sent to an external model.
- Audit Logger: An immutable logging service that records every prompt, its redactions, the LLM response, and cost metrics for full auditability.
4.4. AI Processing Layer
- Description: A collection of containerized microservices that deliver ChainAlign's intelligence.
- Key Components:
- Orchestration Services (Node.js): Higher-level services that manage business workflows (e.g.,
AIInsightEngine,ConsensusEngine,ConstraintIntelligenceEngine). They orchestrate calls to other services. - Dual-Engine Search: A hybrid search system providing a unified experience. It features a Search Orchestrator that intelligently routes queries to the appropriate engine (PostgreSQL for analytics, Typesense for text) and a Typesense Index for high-speed textual search.
- Specialized Python Services: Computationally intensive or specialized AI tasks are handled by dedicated Python microservices, including:
Cognee-Service: Manages the construction of the knowledge graph.Ragas-Eval-Service: Runs automated evaluations on the RAG pipeline.Montecarlo-Service: Performs probabilistic simulations.Langextract-Service: For advanced entity and relationship extraction.
- Core Google AI Services: Managed Google Cloud services providing foundational AI capabilities.
- LLM - Google Gemini: The core large language model for reasoning and generation.
- Speech-to-Text Service: For real-time transcription.
- Orchestration Services (Node.js): Higher-level services that manage business workflows (e.g.,
4.5. Data Ingestion Pipeline
- Description: A multi-stage pipeline for processing structured and unstructured data, leveraging data virtualization where possible to reduce latency.
- Key Components:
- Data Sources: Connectors for ERPs, CRMs, and file stores like Google Cloud Storage.
- Processing: Google Document AI is used for OCR and structured data extraction from PDFs and images.
- Chunking & Embedding Services: Python microservices that break documents into semantically relevant chunks and generate vector embeddings for retrieval.
4.6. Data Layer
- Description: The persistent storage for all multi-tenant S&OP data, AI-generated insights, and application logs.
- Key Components:
- PostgreSQL on Supabase: The primary source of truth for all structured S&OP data. It handles complex analytical queries and ensures transactional integrity. Textual data from this database is synchronized to the Typesense index to power high-speed search.
- Zep/Graffiti (Knowledge Graph): Stores the relationships between entities (products, suppliers, etc.), forming the structural backbone of the Hybrid GraphRAG engine and the Compliance Knowledge Graph.
- Cloud Firestore: Used for storing semi-structured data like conversation transcripts and certain application logs.
- Google Document AI Integration: While not a storage component, it's crucial for the data layer as it processes unstructured documents into JSON for RAG, feeding into PostgreSQL and Zep/Graffiti.
4.7. Monitoring & Evaluation Layer
- Description: Services dedicated to monitoring platform health and evaluating the quality of AI outputs.
- Key Components:
- Google Cloud Operations: Centralized logging, monitoring, and alerting for all services.
- AI Evals (Ragas): The
ragas-eval-serviceperiodically runs evaluations against a "golden dataset" to measure RAG performance metrics like Faithfulness and Context Precision, ensuring the reliability of the AI.
4.8. Service-by-Service Breakdown
This section provides a detailed overview of the key microservices within the ChainAlign platform, outlining their responsibilities, core technologies, and interactions.
4.8.1. Forecasting Service
- Responsibility: Generates high-quality, unconstrained forecasts by integrating historical data and external context (weather, news, policy).
- Key Technologies: Python, FastAPI, Machine Learning Libraries (e.g., Prophet, XGBoost), Google Pub/Sub for data ingestion.
- Inputs & Outputs: Consumes historical sales data, external data feeds; produces forecast data.
4.8.2. Constraint Intelligence Engine
- Responsibility: Evaluates the operational and financial feasibility of a plan against defined business constraints (e.g., capacity, budget).
- Key Technologies: Python, FastAPI, Optimization Libraries, Google Pub/Sub.
- Inputs & Outputs: Consumes forecast data, business rules, and operational constraints; produces feasibility assessments and trade-off analyses.
4.8.3. Strategic Objectives Engine
- Responsibility: Assesses how well a proposed plan aligns with high-level strategic company goals and KPIs.
- Key Technologies: Python, FastAPI, Rules Engines, Google Pub/Sub.
- Inputs & Outputs: Consumes plan data, strategic objectives, and KPIs; produces strategic alignment scores.
4.8.4. Decision Support Engine
- Responsibility: Synthesizes outputs from the constraint and strategic engines to provide a unified decision-support view for the user.
- Key Technologies: Node.js, Express.js, Real-time data processing.
- Inputs & Outputs: Consumes outputs from CIE and SOE; produces aggregated decision views for the UI.
4.8.5. Scenario Orchestration Service
- Responsibility: Allows users to create, manage, and compare multiple what-if scenarios based on different assumptions and forecast adjustments.
- Key Technologies: Node.js, Express.js, PostgreSQL.
- Inputs & Outputs: Consumes user-defined scenario parameters; orchestrates calls to Forecasting, CIE, and SOE; stores scenario results.
Scenario Orchestration Data Flow
The Scenario Orchestration Service initiates a new scenario by taking user-defined parameters. It then interacts with the Forecasting Service to generate a baseline forecast for the scenario. This scenario-specific forecast is subsequently passed to the Constraint Intelligence Engine and the Strategic Objectives Engine for evaluation against feasibility and strategic alignment criteria. The results from these engines are then aggregated and stored by the Scenario Orchestration Service for comparison and presentation in the UI.
4.8.6. Data Ingestion & Validation Service
- Responsibility: Provides a robust pipeline for ingesting, validating, and transforming data from various internal and external sources.
- Key Technologies: Python, FastAPI, Google Document AI, Data Validation Libraries, Google Pub/Sub.
- Inputs & Outputs: Consumes raw data from various sources (ERP, CRM, GCS); produces validated and transformed data for PostgreSQL and Zep/Graffiti.
4.8.7. Observability Service
- Responsibility: Monitors the health, performance, and data flow of all microservices within the ChainAlign platform.
- Key Technologies: Google Cloud Operations (Logging, Monitoring, Trace), Prometheus, Grafana.
- Inputs & Outputs: Collects logs, metrics, and traces from all services; provides dashboards and alerts.
5. Deployment Strategy
- Frontend: The React.js application is deployed as a static site to Cloudflare Pages for global CDN distribution and performance.
- Backend Services: All backend microservices (Node.js and Python) are containerized using Podman and deployed to Google Cloud Run, enabling serverless auto-scaling.
- Databases: PostgreSQL is hosted on Supabase, while Firestore is part of the Google Cloud ecosystem.
- Authentication: Firebase Authentication provides a managed, scalable identity solution.
7. Implementation Roadmap (High-Level)
Phase 1: Foundation & Core AI (MVP)
- Set up core data layer (Supabase, Zep, Graffiti).
- Implement basic STT and LLM integration for conversational queries.
- Develop initial Intelligent Charting Engine for key S&OP metrics.
- Build core UI components and dashboards.
Phase 2: Advanced Intelligence & Features
- Enhance AI Insight Engine with predictive and prescriptive capabilities.
- Implement Constraint Intelligence Engine for Monte Carlo simulations.
- Develop full Consensus Lock-In Protocol and versioning.
- Build Admin Backend for customer management and platform monitoring.
Phase 3: Optimization & Enterprise Readiness
- Performance optimization and scalability enhancements.
- Advanced security and compliance features (SOX, SOC2, GDPR).
- Integration with enterprise ERP/CRM systems.
- Continuous learning and model refinement.