Functional Specification Document: GraphQL Integration for ChainAlign

1. Introduction

1.1 Purpose

This document details the functional requirements and technical design considerations for integrating a GraphQL API layer into the existing ChainAlign backend. The primary goal is to enhance the flexibility and efficiency of data retrieval for frontend applications (React UI, CISO Dashboard) by providing a unified and strongly-typed API.

1.2 Goals

To establish a robust GraphQL server using Apollo Server.
To define a comprehensive GraphQL schema that accurately represents ChainAlign's core entities and their relationships.
To implement resolvers that efficiently connect the GraphQL layer to existing data repositories (Knex/PostgreSQL) and business logic.
To ensure secure multi-tenancy by integrating JWT-based context into GraphQL operations.
To enable efficient querying of interconnected intelligence layers, such as Adaptive Forecasting and the Judgment Graph.
To provide a mechanism for performing write operations (mutations) through GraphQL.
To facilitate incremental adoption of GraphQL with minimal immediate refactoring of existing backend services.

1.3 Scope

This FSD covers the integration of GraphQL as an API layer on top of the existing Node.js/Express backend. It focuses on defining the schema, resolvers, and context integration. It does not cover:

A complete rewrite of existing REST endpoints (GraphQL will coexist initially).
Frontend consumption of the GraphQL API (this will be handled by frontend teams).
Detailed performance tuning beyond initial implementation.

2. Functional Requirements

2.1 Phase 1: Establish the GraphQL Server and Schema (MVP)

2.1.1 Integrate Apollo Server

FR-GraphQL-1.1.1: The system SHALL install @apollo/server, graphql, and express npm packages.
FR-GraphQL-1.1.2: The server.js file SHALL be modified to initialize ApolloServer with the defined schema and resolvers.
FR-GraphQL-1.1.3: The ApolloServer SHALL be applied as middleware to the existing Express application, making the GraphQL endpoint accessible (e.g., /graphql).

2.1.2 Define the Root Schema

FR-GraphQL-1.2.1: The system SHALL define a GraphQL schema using Schema Definition Language (SDL) in a dedicated file (e.g., schema.graphql or a modular structure).
FR-GraphQL-1.2.2: The schema SHALL include a Query type to define entry points for read operations.
FR-GraphQL-1.2.3: The schema SHALL define the MLConfig type with fields: id (ID!), primaryModel (String!), validationMetric (String!), validationValue (Float!), hindcastDate (String), and version (Int!).
FR-GraphQL-1.2.4: The Query type SHALL include an activeMLConfig field that accepts entityType (String!) and locationId (ID) arguments and returns an MLConfig object.
FR-GraphQL-1.2.5: The Query type SHALL include a placeholder decisionProblem field that accepts an id (ID!) and returns a DecisionProblem (to be defined in later phases).

2.2 Phase 2: Implement Resolvers and Connect Data Layers

2.2.1 Implement Resolvers

FR-GraphQL-2.1.1: The system SHALL create a resolvers.js file to house the resolver functions.
FR-GraphQL-2.1.2: The activeMLConfig resolver SHALL import and utilize the existing MLConfigRepository.js to fetch active ML configurations.
FR-GraphQL-2.1.3: The activeMLConfig resolver SHALL accept parent, args, and context parameters as per GraphQL resolver signature.
FR-GraphQL-2.1.4: The activeMLConfig resolver SHALL pass tenantId and other relevant arguments to the MLConfigRepository.findActive method.
FR-GraphQL-2.1.5: DataLoader SHALL be implemented for resolvers that fetch related objects to prevent N+1 query problems.

2.2.2 Integrate Context for Multi-Tenancy

FR-GraphQL-2.2.1: The existing JWT authentication middleware SHALL be extended or adapted to extract tenant_id and user_id from authenticated requests.
FR-GraphQL-2.2.2: The extracted tenant_id and user_id SHALL be attached to the GraphQL execution context object.
FR-GraphQL-2.2.3: All resolvers SHALL have access to the context.user object containing tenant_id and user_id to enforce data isolation and authorization.
FR-GraphQL-2.2.4: The context object SHALL also expose data sources (e.g., mlConfigRepository, hindcastJobRepository) to resolvers for consistent data access.

2.3 Phase 3: Connect the Intelligence Layers

2.3.1 Stitch Forecast Data

FR-GraphQL-3.1.1: The schema SHALL define an AdaptiveForecast type with fields such as id (ID!) and value (Float!).
FR-GraphQL-3.1.2: The schema SHALL define a Constraint type with relevant fields.
FR-GraphQL-3.1.3: The AdaptiveForecast type SHALL include a violatingConstraints field that returns a list of Constraint objects, representing a linked relationship.
FR-GraphQL-3.1.4: A resolver SHALL be implemented for the violatingConstraints field within the AdaptiveForecast type to fetch associated constraint data.

type AdaptiveForecast {
  id: ID!
  value: Float!
  violatingConstraints: [Constraint]
}

type Tenant {
  id: ID!
  name: String!
  # Add other relevant tenant fields
}

type User {
  id: ID!
  email: String!
  role: String!
  tenant: Tenant!
  # Add other relevant user fields
}

type SOPCycle {
  id: ID!
  name: String!
  startDate: String!
  endDate: String!
  status: String!
  # Add other relevant S&OP Cycle fields
}

type Scenario {
  id: ID!
  name: String!
  description: String
  status: String!
  # Add other relevant Scenario fields
}

2.3.2 Model the Judgment Graph

FR-GraphQL-3.2.1: The schema SHALL define types for Note, DecisionProblem, and Decision to represent entities within the Judgment Graph.
FR-GraphQL-3.2.2: The schema SHALL model relationships between these types, such as a Note having a promotedToProblem field returning a DecisionProblem.
FR-GraphQL-3.2.3: Resolvers SHALL be implemented to traverse these relationships, allowing the frontend to query interconnected entities (e.g., a Decision and its linked Scenario, DecisionProblem, and source Note in a single request).

Concrete Schema Structure for Judgment Graph:

type Note {
  id: ID!
  content: String!
  author: User!
  createdAt: String!
  promotedToProblem: DecisionProblem # Nullable if not promoted
}

type DecisionProblem {
  id: ID!
  title: String!
  sourceNote: Note!
  decisions: [Decision!]!
  linkedScenarios: [Scenario!]!
}

type Decision {
  id: ID!
  problem: DecisionProblem!
  chosenScenario: Scenario!
  rationale: String!
  recordedBy: User!
  recordedAt: String!
}

**Example Resolver (`DecisionProblem.constraintConflicts`):**

```javascript
const resolvers = {
  DecisionProblem: {
    // This resolver pulls data from the Constraint Intelligence Engine service
    constraintConflicts: async (parent, args, context) => {
      // parent is the DecisionProblem object fetched in the Query root
      const { user, dataSources } = context;
      const tenantId = user ? user.tenant_id : 'default-tenant-id'; // Placeholder for now

      // Call the service responsible for graph/constraint validation
      return dataSources.constraintService.getViolationsByProblem(
        tenantId,
        parent.id
      );
    },
  },
  // ... other resolvers
};

### 2.4 Phase 4: Implement Mutations (Write Operations)

#### 2.4.1 Implement Mutations for critical actions

*   **FR-GraphQL-4.1.1:** The schema SHALL define a `Mutation` type to encapsulate write operations.
*   **FR-GraphQL-4.1.2:** The `Mutation` type SHALL include a `recordFinalDecision` field.
*   **FR-GraphQL-4.1.3:** The `recordFinalDecision` mutation SHALL accept `problemId (ID!)`, `chosenScenarioId (ID!)`, and `rationale (String!)` as arguments.
*   **FR-GraphQL-4.1.4:** The `recordFinalDecision` mutation SHALL return a `Decision!` object representing the newly created immutable decision record.
*   **FR-GraphQL-4.1.5:** A resolver for `recordFinalDecision` SHALL be implemented, which will encapsulate the logic currently handled by the `Decision Service` (e.g., `POST /api/internal/v1/decisions`).

## 3. Technical Design Considerations

### 3.1 Apollo Server Integration

*   **Middleware:** Apollo Server will be integrated as Express middleware, allowing it to coexist with existing REST endpoints.
*   **Error Handling:** Apollo Server's built-in error handling will be utilized, with custom formatters if needed to mask sensitive information or provide standardized error codes.

### 3.2 Schema Definition Language (SDL)

*   **Modular Schema:** The GraphQL schema will be organized into multiple files (e.g., `mlConfig.graphql`, `forecast.graphql`, `judgment.graphql`) and merged using tools like `graphql-tools` to maintain modularity and readability.
*   **Type Naming:** Standard GraphQL naming conventions (PascalCase for types, camelCase for fields) will be followed.

#### 3.2.1 TypeScript Integration

*   **Code Generation:** `@graphql-codegen` SHALL be used to automatically generate TypeScript types from the GraphQL schema. This ensures type safety across the backend (resolvers, data sources) and frontend (client-side queries).
*   **Benefits:** Early detection of schema/resolver mismatches, improved developer experience, and reduced runtime errors.

### 3.3 Resolver Implementation

*   **Repository Pattern:** Resolvers will primarily interact with existing backend repositories (e.g., `MLConfigRepository`, `HindcastJobRepository`) to fetch and manipulate data, minimizing duplication of business logic.
*   **Asynchronous Operations:** Resolvers will be asynchronous to handle database calls and other I/O operations efficiently.
*   **N+1 Problem:** Strategies like DataLoader will be considered in later optimization phases to address potential N+1 query problems for complex relationships.

### 3.4 Context Management (Authentication/Authorization)

*   **JWT Integration:** The Express middleware will parse JWT tokens, validate them, and extract user information (`tenant_id`, `user_id`, roles) into the `context` object.
*   **Authorization:** Resolvers will perform authorization checks based on the `context.user` information.
*   **Dependency Injection:** Data sources (repositories) will be injected into the GraphQL context to ensure they are available to all resolvers and can be easily mocked for testing.

#### 3.4.1 Context Factory

A dedicated context factory function SHALL be used to instantiate and provide data sources to the GraphQL context.

```javascript
// backend/src/graphql/contextFactory.js
import MLConfigRepository from '../dal/MLConfigRepository.js';
import HindcastJobRepository from '../dal/HindcastJobRepository.js';
// ... import other repositories

export function createContext({ req }) {
  return {
    user: req.user, // from JWT middleware
    dataSources: {
      mlConfigRepository: MLConfigRepository,
      hindcastJobRepository: HindcastJobRepository,
      // ... instantiate other repositories here
    },
  };
}

Usage in Apollo Server config:

// backend/server.js
import { createContext } from './src/graphql/contextFactory.js';

const apolloServer = new ApolloServer({
  typeDefs,
  resolvers,
  context: createContext, // Use the factory function
});

3.5 Data Source Integration (Knex/PostgreSQL, Repositories)

Direct Repository Calls: Resolvers will directly call methods on instantiated repository objects, which in turn use Knex to interact with PostgreSQL for simple entity lookups.
Service Layer Integration for Relationships: For complex relationship fields (e.g., Note.promotedToProblem, DecisionProblem.constraintConflicts) that involve multi-hop reasoning or integration with other intelligence layers (like the Judgment Graph or Constraint Intelligence Engine), resolvers SHALL call dedicated service layers (e.g., GraphRAGService, ConstraintService). This approach allows for sophisticated graph traversal logic and avoids direct Knex queries for such relationships.
Dependency Injection: Data sources (repositories and service clients) will be injected into the GraphQL context to ensure they are available to all resolvers and can be easily mocked for testing.

3.6 Error Handling

Standard GraphQL Errors: Errors will be returned in the standard GraphQL errors array format.
Custom Error Types: Custom GraphQL error types SHALL be introduced for specific business logic errors to provide more granular feedback to clients. These will include:
- AuthenticationError (401): For invalid or missing JWT.
- AuthorizationError (403): For access denied due to incorrect permissions or tenant mismatch.
- ValidationError (400): For invalid input arguments.
- NotFoundError (404): For resources that do not exist.
- ServerError (500): For unexpected backend or database errors.

Standardized Error Format: Errors SHALL adhere to a standardized format:

{
  "errors": [{
    "message": "string",
    "extensions": {
      "code": "ERROR_CODE",
      "httpStatus": "number",
      "stacktrace": ["string"] // Only in development/debug
    }
  }]
}

3.7 Testability

Unit Tests for Resolvers: Resolvers SHALL be unit tested by mocking their dependencies (e.g., MLConfigRepository). This ensures individual resolver logic is correct.
Integration Tests for Queries: End-to-end integration tests SHALL be implemented to verify full GraphQL queries, including context setup, data fetching through repositories, and correct response formatting. These tests will use a real (test) database.
Context Mocking Strategy: A clear strategy for mocking the GraphQL context object SHALL be defined for testing purposes, allowing control over user and dataSources during resolver unit tests.
Example: Tenant Isolation Test: Integration tests SHALL include scenarios to verify that context.user.tenant_id properly filters queries and prevents cross-tenant data access.

4. Non-Functional Requirements

4.1 Performance

Query Optimization: Resolvers will be designed for efficient data retrieval, leveraging database indexing and optimized repository methods.
Caching: Caching strategies (e.g., Redis, in-memory) may be explored in future iterations for frequently accessed data.
Batching/Deduplication: DataLoader will be considered to prevent redundant data fetches (N+1 problem).

4.1.1 Caching Strategy

Cache Key Strategy: Cache keys SHALL be granular and include relevant identifiers such as tenantId, entityType, locationId, and version (for versioned data like MLConfig). Example: mlConfig:{tenantId}:{entityType}:{locationId}.
Invalidation on Mutations: Caches SHALL be invalidated automatically when mutations modify the underlying data. For example, a recordFinalDecision mutation should invalidate relevant Decision and DecisionProblem caches.
TTL Strategy: Time-to-Live (TTL) values SHALL be configured based on data volatility. Highly dynamic data (e.g., real-time forecasts) will have short TTLs, while static reference data (e.g., MLConfig) may have longer TTLs or be invalidated only on change.

4.2 Security

Authentication: All GraphQL endpoints will be protected by the existing JWT authentication mechanism.
Authorization: Fine-grained authorization checks will be implemented within resolvers based on user roles and tenant_id.
Input Validation: All GraphQL arguments will be validated to prevent injection attacks and ensure data integrity.
Rate Limiting: API Gateway or Express middleware SHALL implement complexity-based rate limiting to prevent abuse and manage resource consumption. This will involve:
- Assigning a cost to each field and nested relationship in the schema.
- Calculating the total cost of a query before execution.
- Rejecting queries that exceed a predefined cost threshold per user/tenant within a given time window.
- Consider using libraries like graphql-cost-analysis or implementing custom middleware.

4.3 Scalability

Stateless Server: The Apollo Server will be configured to be stateless to facilitate horizontal scaling.
Database Load: Resolver design will consider minimizing database load, especially for complex queries.

4.4 Maintainability

Modular Codebase: Schema, resolvers, and data sources will be organized into a modular structure for ease of understanding and maintenance.
Code Standards: Adherence to existing JavaScript/TypeScript coding standards and linting rules.
Documentation: Comprehensive JSDoc comments for resolvers and schema definitions.

5. Assumptions and Constraints

5.1 Assumptions

The existing Node.js/Express backend and Knex/PostgreSQL database are stable and functional.
The existing repository pattern provides a clean abstraction for data access.
JWT-based authentication is already in place and can provide tenant_id and user_id.
Frontend teams are prepared to consume the new GraphQL API.
Initial GraphQL integration will coexist with existing REST APIs.
The initial monolithic GraphQL schema will be refactored to Apollo Federation if schema size exceeds maintenance threshold (estimated Phase 5+).

5.2 Constraints

The initial implementation will prioritize read operations (queries) over write operations (mutations).
Complex optimizations (e.g., advanced caching, distributed tracing) will be deferred to later phases.
The project timeline requires an incremental approach to deliver value quickly for demos.

6. Success Criteria

A functional GraphQL endpoint is accessible and responds to basic MLConfig queries.
tenant_id and user_id are correctly propagated through the GraphQL context to resolvers.
Resolvers successfully fetch data using existing repositories.
The GraphQL schema accurately reflects core entities and their relationships.
Frontend applications can successfully query data via the new GraphQL API.
All unit and integration tests for the GraphQL layer pass.

1. Introduction​

1.1 Purpose​

1.2 Goals​

1.3 Scope​

2. Functional Requirements​

2.1 Phase 1: Establish the GraphQL Server and Schema (MVP)​

2.1.1 Integrate Apollo Server​

2.1.2 Define the Root Schema​

2.2 Phase 2: Implement Resolvers and Connect Data Layers​

2.2.1 Implement Resolvers​

2.2.2 Integrate Context for Multi-Tenancy​

2.3 Phase 3: Connect the Intelligence Layers​

2.3.1 Stitch Forecast Data​

2.3.2 Model the Judgment Graph​

3.5 Data Source Integration (Knex/PostgreSQL, Repositories)​

3.6 Error Handling​

3.7 Testability​

4. Non-Functional Requirements​

4.1 Performance​

4.1.1 Caching Strategy​

4.2 Security​

4.3 Scalability​

4.4 Maintainability​

5. Assumptions and Constraints​

5.1 Assumptions​

5.2 Constraints​

6. Success Criteria​