Functional Specification Document: LLM Bias Monitoring

Version: 1.0 Date: October 30, 2025 Status: Draft

1.0 Executive Summary

1.1 Purpose

This document specifies the functional and technical requirements for integrating a comprehensive LLM Bias Monitoring system into ChainAlign's existing AI Compliance & Trust Layer and Judgment Graph. The primary goal is to combine technical instrumentation with human oversight to detect, quantify, and manage bias in LLM responses, ensuring fair, transparent, and aligned decision-making within the enterprise. By leveraging GraphQL, this feature will seamlessly integrate into ChainAlign's unified API ecosystem.

1.2 Problem Statement

LLMs, while powerful, can perpetuate and amplify biases present in their training data or introduced through prompt engineering. For ChainAlign's enterprise decision platform, undetected bias can lead to systematically skewed trade-offs, unfair recommendations, and decisions that do not align with strategic objectives or ethical guidelines. This poses significant risks to trust, compliance, and strategic outcomes.

1.3 Solution Overview

The solution extends the existing LLMClient and the llm_interaction_audit table to include robust bias auditing capabilities. A new Bias Analyzer Service will perform automated linguistic, semantic, and counterfactual bias checks. The results will be stored in the extended audit log and integrated into the Judgment Graph. The entire workflow will be exposed and managed via GraphQL, providing a type-safe and consistent interface for all related services and UIs.

2.0 Core Design Principles

Transparency & Explainability: Make detected bias explicit and explainable within the Judgment Graph and dashboards.
Continuous Learning: Incorporate human feedback loops to continuously improve bias detection and mitigation strategies.
Model-Agnostic Detection: Design the bias analysis to be applicable regardless of the underlying LLM provider.
Granularity & Flexibility: Capture various types and scores of bias, allowing for nuanced analysis and evolution of detection methods.
GraphQL-First Integration: Expose all bias-related data and operations via GraphQL to maintain consistency with ChainAlign's evolving API architecture.
Non-Blocking by Default: Bias analysis should ideally be asynchronous to avoid impacting the latency of core LLM interactions, providing a graceful degradation path.

3.0 Architectural Placement

This feature extends the AI Compliance & Trust Layer and the Judgment Graph & Decision Layer within ChainAlign's architecture. The new Bias Analyzer Service will operate as a microservice, orchestrated by the LLMClient and integrating with the primary data store via GraphQL.

4.0 Functional Requirements

4.1 `llm_interaction_audit` Database Schema Extension

FR-BIAS-4.1.1: The llm_interaction_audit table SHALL be extended with the following columns to store bias-related metadata:

bias_detected (BOOLEAN, default: FALSE): Indicates if any bias was identified.
bias_type (JSONB): An array of strings categorizing the detected bias (e.g., ["representation", "language"]). This structure aligns with the bias_taxonomy for controlled vocabulary. (Note: While JSONB offers flexibility, an alternative for stricter data integrity would be an array of foreign keys to bias_taxonomy.id.)
bias_score (JSONB): A JSON object storing quantitative scores for different bias metrics (e.g., {"representation_score": 0.7, "sentiment_divergence": 0.3}).
bias_confidence_score (DECIMAL(5,2)): A score (0-1) indicating the confidence level of the bias detection.
mitigation_action (JSONB): An object describing any automated or suggested mitigation actions (e.g., {"type": "REPHRASE_PROMPT", "details": "Adjusted sentiment around supplier description"}).
bias_report_summary (JSONB): A detailed JSON object containing the full output from the Bias Analyzer Service.
human_reviewed (BOOLEAN, default: FALSE): Flag to indicate if the entry has been reviewed by a human.
human_bias_tag (JSONB): Human-assigned bias tags or classifications.
review_notes (TEXT): Notes from the human reviewer.

FR-BIAS-4.1.2: Appropriate database indexes SHALL be created on these new columns to optimize querying (e.g., idx_audit_bias_detected for flagging, idx_audit_bias_type for JSONB content, idx_audit_bias_confidence for triage).

FR-BIAS-4.1.3: A bias_taxonomy reference table SHALL be created to standardize bias categories:

CREATE TABLE bias_taxonomy (
  id SERIAL PRIMARY KEY,
  bias_key TEXT NOT NULL UNIQUE,
  description TEXT,
  severity_level INTEGER NOT NULL CHECK (severity_level BETWEEN 1 AND 5)
);

It allows for controlled vocabulary and future expansion of mitigation policies for bias_type and human_bias_tag fields, and severity_level can inform priority for human review.

4.2 GraphQL Schema Extensions

FR-BIAS-4.2.1: A new GraphQL type LlmInteractionAudit SHALL be added to @backend/src/graphql/schema.graphql to represent the LLM interaction audit record, including all its existing and newly added bias-related fields.

extend type Query {
  llmInteractionAudit(id: ID!): LlmInteractionAudit
  llmInteractionAudits(filter: LlmInteractionAuditFilterInput, pagination: PaginationInput): [LlmInteractionAudit!]
}

extend type Mutation {
  updateLlmInteractionAuditBias(id: ID!, input: UpdateLlmInteractionAuditBiasInput!): LlmInteractionAudit!
  recordBiasReview(id: ID!, input: RecordBiasReviewInput!): LlmInteractionAudit!
}

type LlmInteractionAudit {
  id: ID!
  tenantId: ID!
  userId: ID
  userEmail: String
  userRole: String
  queryContext: String
  originalQuery: String!
  llmProvider: String!
  llmModel: String!
  sanitizedPrompt: String
  llmResponse: JSON
  promptTokens: Int
  responseTokens: Int
  estimatedCostUsd: Float
  redactionSummary: JSON
  sensitivityScore: String
  containedPii: Boolean
  containedProprietary: Boolean
  containedCustomerData: Boolean
  status: String!
  errorMessage: String
  logTimestamp: DateTime!
  
  # New Bias-related fields
  biasType: [String!]
  biasScore: JSON
  biasConfidenceScore: Float
  mitigationAction: JSON
  biasReportSummary: JSON
  humanReviewed: Boolean!
  humanBiasTag: JSON
  reviewNotes: String
}

input UpdateLlmInteractionAuditBiasInput {
  biasDetected: Boolean
  biasType: JSON
  biasScore: JSON
  biasConfidenceScore: Float
  mitigationAction: JSON
  biasReportSummary: JSON
}

input RecordBiasReviewInput {
  humanReviewed: Boolean!
  humanBiasTag: JSON
  reviewNotes: String
}

input LlmInteractionAuditFilterInput {
  tenantId: ID
  userId: ID
  llmProvider: String
  llmModel: String
  sensitivityScore: String
  biasDetected: Boolean
  humanReviewed: Boolean
  logTimestampStart: DateTime
  logTimestampEnd: DateTime
  # Add other filterable fields as needed
}

# Placeholder for common PaginationInput and JSON types if not already defined
scalar JSON
scalar DateTime

input PaginationInput {
  limit: Int
  offset: Int
}

FR-BIAS-4.2.2: Resolvers SHALL be implemented for these new Query and Mutation fields, handling data retrieval from and updates to the llm_interaction_audit table via Knex.js.

4.3 Bias Analyzer Service (New Microservice)

FR-BIAS-4.3.1: A new microservice named BiasAnalyzerService SHALL be developed, preferably in Python/FastAPI, and deployed to Google Cloud Run.

FR-BIAS-4.3.2: The BiasAnalyzerService SHALL expose a POST /analyze-bias endpoint with the following contract:

*   **Request Body:**

```json
{
  "llm_response": "string",
  "original_query": "string",
  "query_context": "json" (JSON object),
  "user_id": "ID",
  "tenant_id": "ID",
  "llm_model": "string"
}
```
*   **Response Body (BiasReport JSON):**

```json
{
  "bias_detected": "boolean",
  "bias_type": "array" (e.g., ["representation", "language"]),
  "bias_score": "json" (e.g., { "representation_score": 0.7, "sentiment_divergence": 0.3 }),
  "confidence": "float" (e.g., 0.86),
  "mitigation_action": "json" (e.g., { "type": "REPHRASE_PROMPT", "details": "Adjusted sentiment around supplier description" }),
  "bias_report_summary": "json" (detailed analysis)
}
```

FR-BIAS-4.3.3: The BiasAnalyzerService SHALL implement the following core functionalities:

Linguistic Bias Check: Analyze llm_response for gendered language, sentiment inconsistencies, and toxicity using NLP libraries (e.g., Hugging Face Transformers).
Counterfactual Replay Engine (Future): Generate counterfactual prompts by swapping sensitive entities and rerunning them through the LLM to compare responses for bias divergence.
Semantic Fairness Scorer (Future): Quantify fairness metrics by analyzing the semantic space of LLM responses against sensitive attributes.
Rule-Based Bias Detection: Implement domain-specific rules to flag biases relevant to ChainAlign's industry (e.g., profit over sustainability).

FR-BIAS-4.3.4: The BiasAnalyzerService SHALL be designed for scalability and fault tolerance, with appropriate logging and monitoring.

4.4 LLMClient Modifications

FR-BIAS-4.4.1: The LLMClient SHALL be modified to integrate with the BiasAnalyzerService.

FR-BIAS-4.4.2: After receiving a response from the external LLM and before finalizing the llm_interaction_audit record, LLMClient SHALL make an asynchronous HTTP POST request to the BiasAnalyzerService/analyze-bias endpoint, passing the llm_response, original_query, and relevant context.

FR-BIAS-4.4.3: Upon receiving the BiasReport from the BiasAnalyzerService, LLMClient SHALL construct and execute a GraphQL updateLlmInteractionAuditBias mutation to update the corresponding llm_interaction_audit record with the bias metadata.

FR-BIAS-4.4.4: LLMClient SHALL implement robust error handling for the BiasAnalyzerService call and the GraphQL mutation, ensuring that core LLM functionality is not blocked and logging any failures. If the BiasAnalyzerService returns an error (e.g., 500), LLMClient SHALL log the error and potentially mark the bias analysis as "failed" in the audit record. A fallback mechanism will ensure the audit log is created even if bias analysis fails.

FR-BIAS-4.4.5: After a successful updateLlmInteractionAuditBias GraphQL mutation, LLMClient SHALL publish a lightweight event to a bias.audit.stream topic (e.g., via Google Pub/Sub) with the following payload:

{
  "audit_id": "string",
  "tenant_id": "string",
  "bias_detected": "boolean",
  "bias_score": "object",
  "bias_type": "array",
  "bias_confidence_score": "float",
  "timestamp": "datetime",
  "trace_id": "string" // For end-to-end observability
}

This enables real-time dashboards and alerting systems to react without polling GraphQL.

FR-BIAS-4.4.6: LLMClient SHALL generate a unique trace_id (or correlation_id) for each LLM interaction and propagate it through the BiasAnalyzerService call, the GraphQL updateLlmInteractionAuditBias mutation, and include it in the bias.audit.stream payload. This ensures end-to-end traceability for debugging and observability.

4.5 Judgment Graph Integration

FR-BIAS-4.5.1: The Judgment Graph schema SHALL be extended to store new node types or properties linking bias information to decisions and reasoning paths.

Bias Event Node (Conceptual): A conceptual node type representing a detected bias event, linked to the llm_interaction_audit entry.
Attributes on Decision/Reasoning Nodes: Bias flags (e.g., biasDetected, biasType) and summaries (biasReportSummary) SHALL be associable with Decision or DecisionProblem nodes derived from LLM interactions.

FR-BIAS-4.5.2: GraphRAG tracing mechanisms SHALL be updated to highlight or visualize detected bias events when exploring a decision's reasoning chain.

4.6 Human-in-the-Loop Validation

FR-BIAS-4.6.1: A Bias Dashboard (or an extension of an existing analytics dashboard) SHALL display llm_interaction_audit records, filtered and sorted by bias metrics.

FR-BIAS-4.6.2: A UI interface SHALL allow human reviewers to:

View flagged LLM interactions, their BiasReport, originalQuery, and llmResponse.
Confirm or dismiss bias_detected (which updates humanReviewed via GraphQL recordBiasReview mutation).
Assign human_bias_tag and add review_notes via the same mutation.

FR-BIAS-4.6.3: Human feedback SHALL be captured via the GraphQL recordBiasReview mutation and used to refine BiasAnalyzerService models and rules over time.

FR-BIAS-4.6.4: When bias mitigation actions become frequent in a specific domain (e.g., "representation" bias in financial forecasts), the system SHALL auto-generate a task suggestion in Linear via the LinearJudgmentEngineService.js (or its equivalent TaskService in M49) to address the root cause of the bias. This creates a closed-loop learning system for ethical performance.

4.7 Data Retention and Privacy Policy

FR-BIAS-4.7.1: Define clear data retention rules for the llm_interaction_audit table, including anonymization after a specified period (e.g., 180 days) and aggregation after a longer period (e.g., 1 year).

FR-BIAS-4.7.2: Implement mechanisms to enforce these retention policies, such as automated data purging or anonymization scripts.

4.8 Bias Sandbox Mode (Future)

FR-BIAS-4.8.1: A sandboxed environment SHALL be created for pre-deployment bias testing of new LLM models or prompt patterns.

FR-BIAS-4.8.2: This sandbox SHALL allow replaying historical decision queries through the new model/prompts and comparing resulting BiasReports against baselines.

5.0 Technical Design Considerations

5.1 GraphQL Resolver Implementation

Resolvers for LlmInteractionAudit queries and mutations SHALL encapsulate database interaction for the llm_interaction_audit table. They will use Knex.js (or an underlying data access layer) and enforce tenantId scope.
createLlmInteractionAudit (if applicable) and updateLlmInteractionAuditBias mutations SHALL return the updated LlmInteractionAudit object.
Authentication and Authorization: All GraphQL operations will leverage ChainAlign's existing JWT-based authentication and tenant_id context for multi-tenancy.

5.2 Microservice Communication

LLMClient to BiasAnalyzerService: Direct HTTP/S POST request.
LLMClient to GraphQL: Standard GraphQL client (e.g., Apollo Client, graphql-request) to execute mutations.
Bias Event Stream: A lightweight pub/sub system (e.g., Google Pub/Sub) will be used for the bias.audit.stream to enable real-time analytics and alerting without polling.

5.3 Data Serialization

The JSON scalar type in GraphQL will be used for bias_score, bias_report_summary, and human_bias_tag fields to allow flexible JSON structures.
Bias Confidence Score Precision: The bias_confidence_score is stored as DECIMAL(5,2) in the database for precision. While exposed as Float in GraphQL, clients should be aware of potential floating-point inaccuracies if exact decimal representation is critical for downstream calculations.

5.4 Error Handling

Standard GraphQL error handling will be used for API errors, with clear error codes for BiasAnalyzerService failures (BIAS_ANALYSIS_FAILED).

6.0 Implementation Plan

Phase 1: Database & GraphQL Foundation (1 week)

Task 1.1: Create Knex.js migration to add bias-related columns to llm_interaction_audit table (bias_detected, bias_type, bias_score, mitigation_action, bias_report_summary, human_reviewed, human_bias_tag, review_notes).
Task 1.2: Implement database indexes for the new columns.
Task 1.3: Extend schema.graphql to add the LlmInteractionAudit type, queries (llmInteractionAudit, llmInteractionAudits), and mutations (updateLlmInteractionAuditBias, recordBiasReview).
Task 1.4: Implement basic resolvers for the new LlmInteractionAudit queries and mutations, ensuring they interact correctly with the database.

Phase 2: Bias Analyzer Service & AIGateway Integration (2 weeks)

Task 2.1: Develop BiasAnalyzerService (Python/FastAPI) with the POST /analyze-bias endpoint and initial Linguistic Bias Check functionality (MVP).
Task 2.2: Deploy BiasAnalyzerService to Google Cloud Run.
Task 2.3: Modify LLMClient to call BiasAnalyzerService post-LLM response and then execute the updateLlmInteractionAuditBias GraphQL mutation.
Task 2.4: Implement comprehensive error handling and fallback mechanisms in LLMClient for bias analysis failures.

Phase 3: Judgment Graph & UI Integration (2 weeks)

Task 3.1: Update Judgment Graph schema (or data models for graph traversal) to allow linking bias information to decision nodes.
Task 3.2: Develop initial Bias Dashboard components to display flagged LLM interactions.
Task 3.3: Implement Human-in-the-Loop UI for reviewing and updating bias flags via the recordBiasReview GraphQL mutation.

Phase 4: Advanced Bias Features & Sandbox (Ongoing)

Task 4.1: Enhance BiasAnalyzerService with Counterfactual Replay Engine and Semantic Fairness Scorer.
Task 4.2: Develop Bias Sandbox Mode for pre-deployment testing and comparison.
Task 4.3: Continuously refine models and rules within BiasAnalyzerService based on human feedback.

7.0 Success Metrics

Bias Detection Rate: The BiasAnalyzerService accurately identifies >80% of known biased responses in test datasets.
Human Review Efficiency: Human reviewers can process flagged interactions in the Bias Dashboard within <30 seconds per item (after initial ramp-up).
Auditability: All LLM interactions, including bias analysis results and human reviews, are immutably logged in llm_interaction_audit and retrievable via GraphQL.
Performance Overhead: Bias analysis adds <200ms of latency to the overall LLM interaction (excluding counterfactual replay, which may be asynchronous).
GraphQL Consistency: All bias-related data is consistently queryable and mutable via the GraphQL API, adhering to type safety.
Impact on Decision-Making: Reduction in instances of identified biased recommendations influencing final decisions (measured qualitatively through user feedback and quantitatively by tracking mitigation actions).

8.0 Future Enhancements / Appendices (Optional for next version)

Bias Example Catalog: Create a catalog of bias examples (e.g., "Bias Type", "Example Prompt", "Example Response", "Detection", "Mitigation") to serve as developer guidance and training data seed for the Bias Analyzer's rule set.

1.0 Executive Summary​

1.1 Purpose​

1.2 Problem Statement​

1.3 Solution Overview​

2.0 Core Design Principles​

3.0 Architectural Placement​

4.0 Functional Requirements​

4.1 llm_interaction_audit Database Schema Extension​

4.2 GraphQL Schema Extensions​

4.3 Bias Analyzer Service (New Microservice)​

4.4 LLMClient Modifications​

4.5 Judgment Graph Integration​

4.6 Human-in-the-Loop Validation​

4.7 Data Retention and Privacy Policy​

4.8 Bias Sandbox Mode (Future)​

5.0 Technical Design Considerations​

5.1 GraphQL Resolver Implementation​

5.2 Microservice Communication​

5.3 Data Serialization​

5.4 Error Handling​

6.0 Implementation Plan​

Phase 1: Database & GraphQL Foundation (1 week)​

Phase 2: Bias Analyzer Service & AIGateway Integration (2 weeks)​

Phase 3: Judgment Graph & UI Integration (2 weeks)​

Phase 4: Advanced Bias Features & Sandbox (Ongoing)​

7.0 Success Metrics​

8.0 Future Enhancements / Appendices (Optional for next version)​