Skip to main content

Functional Specification Document: LLM-Agnostic Sanitization Layer with Provider Gateway

Version: 1.0 Date: October 30, 2025 Milestone: M53 (Post-Demo Enhancement) Status: 📋 FSD - Ready for Implementation Planning


Executive Summary

The LLM-Agnostic Sanitization Layer transforms ChainAlign's AI Compliance & Trust system from being tightly coupled to a single LLM provider (Gemini) to a flexible, modular architecture that supports any provider (Claude, LLaMA, Groq, future models) with zero changes to core business logic.

This is achieved through the LLM Provider Gateway (Adapter Pattern) and Standardized Tokenization, ensuring ChainAlign can:

  • 🔄 Switch providers without code changes
  • 💰 Choose the most cost-effective model per task
  • ⚡ Adapt to rapid LLM model evolution
  • 🛡️ Maintain consistent audit trails and compliance

The Problem: Current Tightly-Coupled Architecture

Current State

AIGateway.js
↓ (hardcoded Gemini logic)
RedactionEngine (Python)
↓ (Gemini-specific API calls)
Gemini API

Issues:

  • ❌ Switching to Claude requires changes in AIGateway.js
  • ❌ Model upgrades (Gemini 1.0 → 2.0) require code changes
  • ❌ Adding a new provider requires deep knowledge of core logic
  • ❌ Tokenization tied to Gemini's tokenizer
  • ❌ Cost calculations only work for Gemini pricing

The Solution: LLM Provider Gateway Architecture

Proposed Architecture

AIGateway.js (Business Logic)
↓ (PromptObject: {prompt, tokens, model})
LLMProviderGateway (Router)
├── GeminiAdapter
├── AnthropicAdapter
├── LLamaAdapter
├── GroqAdapter
└── [Future Providers]
↓ (StandardizedResponse: {text, tokens_used, cost})
RedactionEngine (Business Logic)

Key Principle: Core business logic only knows about standardized interfaces, never provider-specific details.


Component Specifications

1. Standard Prompt Object

File: backend/src/types/PromptObject.ts

interface PromptObject {
// Content
prompt: string;
context?: string;
systemMessage?: string;

// Metadata
modelId: string; // e.g., "gpt-4", "claude-3-sonnet", "llama-2-70b"
provider: string; // e.g., "openai", "anthropic", "llama"

// Token Management
promptTokens: number; // Pre-calculated token count
estimatedTotalTokens: number;

// Configuration
temperature?: number;
maxTokens?: number;
topP?: number;

// Audit Trail
userId: string;
tenantId: string;
requestId: string;
timestamp: Date;
}

2. Standardized LLM Response

File: backend/src/types/LLMResponse.ts

interface LLMResponse {
// Output
text: string;
rawResponse: Record<string, any>; // For debugging

// Token Accounting
promptTokens: number;
completionTokens: number;
totalTokens: number;

// Cost Tracking
estimatedCostUSD: number;
costBreakdown: {
inputCost: number;
outputCost: number;
};

// Provider Info
provider: string;
modelUsed: string;

// Metadata
latencyMs: number;
success: boolean;
error?: string;
retryCount: number;

// Audit
requestId: string;
timestamp: Date;
}

3. LLM Provider Gateway (Router)

File: backend/src/services/llm/LLMProviderGateway.js

import GeminiAdapter from './adapters/GeminiAdapter.js';
import AnthropicAdapter from './adapters/AnthropicAdapter.js';
import LLamaAdapter from './adapters/LLamaAdapter.js';
import GroqAdapter from './adapters/GroqAdapter.js';

class LLMProviderGateway {
constructor() {
this.adapters = {
gemini: new GeminiAdapter(),
anthropic: new AnthropicAdapter(),
llama: new LLamaAdapter(),
groq: new GroqAdapter(),
};
}

/**
* Main entry point: Convert PromptObject to LLMResponse
* @param {PromptObject} promptObject
* @returns {Promise<LLMResponse>}
*/
async executePrompt(promptObject) {
const adapter = this.getAdapter(promptObject.provider);

if (!adapter) {
throw new Error(`Unknown provider: ${promptObject.provider}`);
}

try {
const startTime = Date.now();

// Route to appropriate adapter
const rawResponse = await adapter.call(promptObject);

// Standardize response
const standardizedResponse = adapter.standardizeResponse(rawResponse, promptObject);

// Add latency tracking
standardizedResponse.latencyMs = Date.now() - startTime;

return standardizedResponse;
} catch (error) {
return this.handleError(error, promptObject);
}
}

getAdapter(provider) {
return this.adapters[provider.toLowerCase()];
}

handleError(error, promptObject) {
// Standardized error handling across all providers
return {
text: null,
success: false,
error: error.message,
provider: promptObject.provider,
modelUsed: promptObject.modelId,
requestId: promptObject.requestId,
timestamp: new Date(),
};
}
}

export default new LLMProviderGateway();

4. Base Adapter Interface

File: backend/src/services/llm/adapters/BaseAdapter.js

/**
* All provider adapters must extend this interface
* Ensures consistent behavior across all providers
*/
class BaseAdapter {
/**
* Send prompt to the provider's API
* @param {PromptObject} promptObject
* @returns {Promise<Object>} Provider's native response
*/
async call(promptObject) {
throw new Error('Subclass must implement call()');
}

/**
* Convert provider-specific response to StandardizedResponse
* @param {Object} rawResponse Provider's native response
* @param {PromptObject} promptObject Original request
* @returns {LLMResponse}
*/
standardizeResponse(rawResponse, promptObject) {
throw new Error('Subclass must implement standardizeResponse()');
}

/**
* Count tokens using provider's tokenizer
* @param {string} text
* @returns {Promise<number>} Token count
*/
async countTokens(text) {
throw new Error('Subclass must implement countTokens()');
}

/**
* Get model's limits (context window, max output, etc.)
* @param {string} modelId
* @returns {Object}
*/
getModelLimits(modelId) {
throw new Error('Subclass must implement getModelLimits()');
}

/**
* Calculate cost for a request
* @param {number} inputTokens
* @param {number} outputTokens
* @param {string} modelId
* @returns {number} Cost in USD
*/
calculateCost(inputTokens, outputTokens, modelId) {
throw new Error('Subclass must implement calculateCost()');
}
}

export default BaseAdapter;

5. Gemini Adapter (Example Implementation)

File: backend/src/services/llm/adapters/GeminiAdapter.js

import BaseAdapter from './BaseAdapter.js';
import { GoogleGenerativeAI, HarmCategory, HarmBlockThreshold } from '@google/generative-ai';

class GeminiAdapter extends BaseAdapter {
constructor() {
super();
this.client = new GoogleGenerativeAI(process.env.GEMINI_API_KEY);
this.modelMap = {
'gemini-default': 'gemini-2.0-pro', // ← Model upgrade isolated here
'gemini-fast': 'gemini-2.0-flash',
};
}

async call(promptObject) {
const modelId = this.mapModelId(promptObject.modelId);
const model = this.client.getGenerativeModel({ model: modelId });

const response = await model.generateContent({
contents: [
{
role: 'user',
parts: [{ text: promptObject.prompt }],
},
],
safetySettings: [
{
category: HarmCategory.HARM_CATEGORY_HATE_SPEECH,
threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
},
],
generationConfig: {
temperature: promptObject.temperature || 0.7,
topP: promptObject.topP || 0.9,
maxOutputTokens: promptObject.maxTokens || 2048,
},
});

return {
text: response.response.text(),
usage: {
promptTokens: response.response.usageMetadata.promptTokenCount,
completionTokens: response.response.usageMetadata.candidatesTokenCount,
},
rawResponse: response,
};
}

standardizeResponse(rawResponse, promptObject) {
const usage = rawResponse.usage;
const totalTokens = usage.promptTokens + usage.completionTokens;
const cost = this.calculateCost(
usage.promptTokens,
usage.completionTokens,
promptObject.modelId
);

return {
text: rawResponse.text,
rawResponse: rawResponse.rawResponse,
promptTokens: usage.promptTokens,
completionTokens: usage.completionTokens,
totalTokens: totalTokens,
estimatedCostUSD: cost,
costBreakdown: {
inputCost: (usage.promptTokens / 1000000) * 0.075, // Gemini 2.0 Pro pricing
outputCost: (usage.completionTokens / 1000000) * 0.3,
},
provider: 'gemini',
modelUsed: this.mapModelId(promptObject.modelId),
success: true,
requestId: promptObject.requestId,
timestamp: new Date(),
};
}

async countTokens(text) {
const model = this.client.getGenerativeModel({ model: 'gemini-2.0-pro' });
const response = await model.countTokens(text);
return response.totalTokens;
}

getModelLimits(modelId) {
const limits = {
'gemini-2.0-pro': {
contextWindow: 1000000,
maxOutputTokens: 16384,
costPerMillion: { input: 75, output: 300 },
},
'gemini-2.0-flash': {
contextWindow: 1000000,
maxOutputTokens: 16384,
costPerMillion: { input: 7.5, output: 30 },
},
};
return limits[this.mapModelId(modelId)];
}

calculateCost(inputTokens, outputTokens, modelId) {
const mapped = this.mapModelId(modelId);
const limits = this.getModelLimits(mapped);
const inputCost = (inputTokens / 1000000) * limits.costPerMillion.input;
const outputCost = (outputTokens / 1000000) * limits.costPerMillion.output;
return inputCost + outputCost;
}

mapModelId(logicalId) {
return this.modelMap[logicalId] || logicalId;
}
}

export default GeminiAdapter;

6. Anthropic Adapter (New Provider Example)

File: backend/src/services/llm/adapters/AnthropicAdapter.js

import BaseAdapter from './BaseAdapter.js';
import Anthropic from '@anthropic-ai/sdk';

class AnthropicAdapter extends BaseAdapter {
constructor() {
super();
this.client = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY,
});
}

async call(promptObject) {
const message = await this.client.messages.create({
model: promptObject.modelId,
max_tokens: promptObject.maxTokens || 1024,
messages: [
{
role: 'user',
content: promptObject.prompt,
},
],
});

return {
text: message.content[0].text,
usage: {
promptTokens: message.usage.input_tokens,
completionTokens: message.usage.output_tokens,
},
rawResponse: message,
};
}

standardizeResponse(rawResponse, promptObject) {
const usage = rawResponse.usage;
const totalTokens = usage.promptTokens + usage.completionTokens;
const cost = this.calculateCost(
usage.promptTokens,
usage.completionTokens,
promptObject.modelId
);

return {
text: rawResponse.text,
rawResponse: rawResponse.rawResponse,
promptTokens: usage.promptTokens,
completionTokens: usage.completionTokens,
totalTokens: totalTokens,
estimatedCostUSD: cost,
costBreakdown: {
inputCost: (usage.promptTokens / 1000000) * 3, // Claude 3 Sonnet
outputCost: (usage.completionTokens / 1000000) * 15,
},
provider: 'anthropic',
modelUsed: promptObject.modelId,
success: true,
requestId: promptObject.requestId,
timestamp: new Date(),
};
}

async countTokens(text) {
const response = await this.client.messages.countTokens({
model: 'claude-3-5-sonnet-20241022',
messages: [
{
role: 'user',
content: text,
},
],
});
return response.input_tokens;
}

getModelLimits(modelId) {
const limits = {
'claude-3-5-sonnet-20241022': {
contextWindow: 200000,
maxOutputTokens: 4096,
costPerMillion: { input: 3, output: 15 },
},
'claude-3-opus-20240229': {
contextWindow: 200000,
maxOutputTokens: 4096,
costPerMillion: { input: 15, output: 75 },
},
};
return limits[modelId];
}

calculateCost(inputTokens, outputTokens, modelId) {
const limits = this.getModelLimits(modelId);
const inputCost = (inputTokens / 1000000) * limits.costPerMillion.input;
const outputCost = (outputTokens / 1000000) * limits.costPerMillion.output;
return inputCost + outputCost;
}
}

export default AnthropicAdapter;

7. Tokenization Service (LLM-Agnostic)

File: backend/src/services/llm/TokenizationService.js

import TokenizerFactory from '../tokenizer/TokenizerFactory.js';

class TokenizationService {
/**
* Count tokens for ANY provider
* @param {string} text
* @param {string} provider
* @param {string} modelId
* @returns {Promise<number>}
*/
async countTokens(text, provider, modelId) {
const adapter = this.getAdapter(provider);
return adapter.countTokens(text);
}

/**
* Pre-flight check before sending to LLM
* @param {PromptObject} promptObject
* @returns {Promise<{valid: boolean, reason?: string}>}
*/
async validatePromptSize(promptObject) {
const adapter = this.getAdapter(promptObject.provider);
const limits = adapter.getModelLimits(promptObject.modelId);

if (promptObject.estimatedTotalTokens > limits.contextWindow) {
return {
valid: false,
reason: `Prompt (${promptObject.estimatedTotalTokens} tokens) exceeds ${promptObject.modelId} context window (${limits.contextWindow} tokens)`,
};
}

return { valid: true };
}

getAdapter(provider) {
// Delegates to LLMProviderGateway
return LLMProviderGateway.getAdapter(provider);
}
}

export default new TokenizationService();

8. AIGateway Integration (Business Logic - NO Provider Knowledge)

File: backend/src/services/AIGateway.js (Updated)

import LLMProviderGateway from './llm/LLMProviderGateway.js';
import TokenizationService from './llm/TokenizationService.js';

class AIGateway {
/**
* Execute with cost and token checks - PROVIDER AGNOSTIC
*/
async executeWithSanitization(userPrompt, context, options = {}) {
const {
modelId = process.env.DEFAULT_LLM_MODEL,
provider = this.detectProvider(modelId),
userId,
tenantId,
} = options;

// Step 1: Tokenize
const promptTokens = await TokenizationService.countTokens(
userPrompt,
provider,
modelId
);
const contextTokens = await TokenizationService.countTokens(
context,
provider,
modelId
);
const totalTokens = promptTokens + contextTokens;

// Step 2: Validate
const validation = await TokenizationService.validatePromptSize({
provider,
modelId,
estimatedTotalTokens: totalTokens,
});

if (!validation.valid) {
throw new Error(validation.reason);
}

// Step 3: Check cost
const adapter = LLMProviderGateway.getAdapter(provider);
const estimatedCost = adapter.calculateCost(totalTokens, 1000, modelId); // Rough estimate

if (estimatedCost > this.costThreshold) {
throw new Error(
`Estimated cost ($${estimatedCost}) exceeds threshold ($${this.costThreshold})`
);
}

// Step 4: Build prompt object (no provider-specific logic)
const promptObject = {
prompt: userPrompt,
context,
modelId,
provider,
promptTokens,
estimatedTotalTokens: totalTokens,
userId,
tenantId,
requestId: generateUUID(),
timestamp: new Date(),
};

// Step 5: Execute (delegates to gateway)
const response = await LLMProviderGateway.executePrompt(promptObject);

// Step 6: Log audit trail
await this.auditLog.record({
eventType: 'LLM_CALL',
userId,
tenantId,
provider: response.provider,
modelUsed: response.modelUsed,
promptTokens: response.promptTokens,
completionTokens: response.completionTokens,
estimatedCost: response.estimatedCostUSD,
success: response.success,
latencyMs: response.latencyMs,
});

return response;
}

detectProvider(modelId) {
if (modelId.startsWith('gpt-')) return 'openai';
if (modelId.startsWith('claude-')) return 'anthropic';
if (modelId.includes('llama')) return 'llama';
if (modelId.includes('groq')) return 'groq';
return process.env.DEFAULT_LLM_PROVIDER || 'gemini';
}
}

export default new AIGateway();

Handling Change & Evolution

Scenario 1: Google Releases Gemini 3.0

Before (Tightly Coupled): Modify AIGateway.js, update all hardcoded model names After (Gateway Pattern):

// In GeminiAdapter.js - only ONE place to change
this.modelMap = {
'gemini-default': 'gemini-3.0-pro', // ← Changed
'gemini-fast': 'gemini-3.0-flash', // ← Changed
};

// Update pricing in getModelLimits()
'gemini-3.0-pro': {
contextWindow: 2000000,
costPerMillion: { input: 50, output: 200 }, // ← Changed
}

// ✅ ZERO changes needed elsewhere!

Scenario 2: Switch to Claude for Finance Tasks

Before (Tightly Coupled): Rewrite all LLM calling logic in AIGateway.js After (Gateway Pattern):

// Configuration change only
const response = await AIGateway.executeWithSanitization(
prompt,
context,
{
modelId: 'claude-3-opus-20240229', // ← Just change this
provider: 'anthropic', // ← Just change this
}
);

// ✅ Business logic remains unchanged!

Scenario 3: Add LLaMA for On-Premise Deployment

Before (Tightly Coupled): Rewrite AIGateway.js to support new provider After (Gateway Pattern):

// Create LlamaAdapter.js extending BaseAdapter
// Register in LLMProviderGateway
this.adapters.llama = new LlamaAdapter();

// ✅ Immediately available to entire platform!
// ✅ AIGateway knows nothing about it!

Python Redaction Engine (Mirrored Pattern)

File: python-services/redaction-engine-service/llm_gateway.py

from abc import ABC, abstractmethod
from typing import Dict
from enum import Enum

class LLMProvider(str, Enum):
GEMINI = "gemini"
ANTHROPIC = "anthropic"
LLAMA = "llama"

class BaseLLMAdapter(ABC):
@abstractmethod
async def call(self, prompt_object: dict) -> dict:
pass

@abstractmethod
def standardize_response(self, raw_response: dict, prompt_object: dict) -> dict:
pass

class GeminiRedactionAdapter(BaseLLMAdapter):
async def call(self, prompt_object: dict) -> dict:
# Gemini-specific implementation
pass

class LLMGateway:
def __init__(self):
self.adapters = {
LLMProvider.GEMINI: GeminiRedactionAdapter(),
LLMProvider.ANTHROPIC: AnthropicRedactionAdapter(),
}

async def execute_redaction(self, prompt_object: dict) -> dict:
adapter = self.adapters[prompt_object['provider']]
raw_response = await adapter.call(prompt_object)
return adapter.standardize_response(raw_response, prompt_object)

Compliance & Audit Trail

Audit Log Schema

CREATE TABLE llm_audit_logs (
id UUID PRIMARY KEY,
event_type VARCHAR(50),
user_id UUID NOT NULL REFERENCES users(user_id),
tenant_id UUID NOT NULL REFERENCES tenants(tenant_id),

-- LLM Details
provider VARCHAR(50), -- gemini, anthropic, llama, etc.
model_used VARCHAR(100), -- gpt-4, claude-3-opus, etc.

-- Token Accounting
prompt_tokens INTEGER,
completion_tokens INTEGER,
total_tokens INTEGER,

-- Cost Tracking
estimated_cost_usd DECIMAL(10, 6),
cost_breakdown JSONB, -- { input_cost, output_cost }

-- Performance
latency_ms INTEGER,

-- Status
success BOOLEAN,
error_message TEXT,
retry_count INTEGER DEFAULT 0,

-- Timestamps
created_at TIMESTAMP DEFAULT NOW(),

-- Redaction Details (if applicable)
redacted_field_count INTEGER,
redaction_confidence DECIMAL(3, 2)
);

Testing Strategy

Unit Tests

// tests/unit/GeminiAdapter.test.js
describe('GeminiAdapter', () => {
it('standardizes Gemini response to LLMResponse format', () => {
// Mock Gemini response
// Call standardizeResponse()
// Assert LLMResponse schema
});

it('calculates cost correctly for Gemini models', () => {
const cost = adapter.calculateCost(1000, 500, 'gemini-2.0-pro');
expect(cost).toBeCloseTo(0.112, 3); // (1000*0.075 + 500*0.3) / 1M
});
});

// tests/unit/AnthropicAdapter.test.js
describe('AnthropicAdapter', () => {
it('standardizes Anthropic response to LLMResponse format', () => {
// Same test structure - BOTH adapters tested identically
});
});

// tests/unit/LLMProviderGateway.test.js
describe('LLMProviderGateway', () => {
it('routes to correct adapter based on provider', async () => {
const response = await gateway.executePrompt({
provider: 'gemini',
modelId: 'gemini-2.0-pro',
prompt: 'test',
});
expect(response.provider).toBe('gemini');
});

it('produces standardized response for ANY provider', async () => {
const geminiResponse = await gateway.executePrompt({ provider: 'gemini' });
const anthropicResponse = await gateway.executePrompt({ provider: 'anthropic' });

// Both should have same schema
expect(geminiResponse).toHaveProperty('totalTokens');
expect(anthropicResponse).toHaveProperty('totalTokens');
});
});

Integration Tests

describe('AIGateway (Provider Agnostic)', () => {
it('works with ANY provider transparently', async () => {
// Test with Gemini
const geminiResult = await gateway.executeWithSanitization(
'Redact: SSN-123-45-6789',
'',
{ provider: 'gemini', modelId: 'gemini-2.0-pro' }
);

// Test with Anthropic - SAME LOGIC
const anthropicResult = await gateway.executeWithSanitization(
'Redact: SSN-123-45-6789',
'',
{ provider: 'anthropic', modelId: 'claude-3-opus-20240229' }
);

// Both should work identically
expect(geminiResult.success).toBe(true);
expect(anthropicResult.success).toBe(true);

// Both should have redacted output
expect(geminiResult.text).not.toContain('123-45-6789');
expect(anthropicResult.text).not.toContain('123-45-6789');
});
});

Implementation Phases

Phase 1: Foundation (2-3 weeks)

  • Define TypeScript types (PromptObject, LLMResponse)
  • Implement BaseAdapter interface
  • Implement LLMProviderGateway router
  • Migrate existing Gemini logic to GeminiAdapter

Phase 2: Provider Expansion (2-3 weeks)

  • Implement AnthropicAdapter
  • Implement LlamaAdapter
  • Implement GroqAdapter
  • Add provider detection logic

Phase 3: Tokenization & Cost (1-2 weeks)

  • Implement TokenizationService
  • Integrate with each adapter's tokenizer
  • Add cost calculation to adapters
  • Update audit logging

Phase 4: Testing & Hardening (2 weeks)

  • Comprehensive unit tests for each adapter
  • Integration tests (provider-agnostic)
  • Performance benchmarks
  • Load testing

Success Metrics

  • ✅ Zero changes to AIGateway.js when adding a new provider
  • ✅ Model upgrades handled in adapter only
  • ✅ Cost calculations accurate within 2% for all providers
  • ✅ Token counts validated against provider APIs
  • ✅ Audit logs capture 100% of LLM interactions
  • ✅ Response time <200ms for token counting
  • ✅ Support for ≥4 major LLM providers

Risks & Mitigation

RiskMitigation
Provider API changes break adapterVersion adapters, maintain historical compatibility layer
Token count discrepanciesValidate against provider APIs quarterly, add reconciliation logs
Cost calculation errorsAudit against actual billing monthly, alert on variance >5%
Adapter complexity growsStrict interface contracts, code reviews for new adapters
Performance degradationBenchmark each adapter, cache tokenization results

Dependencies

  • @google/generative-ai (Gemini)
  • @anthropic-ai/sdk (Anthropic)
  • transformers (for LLaMA tokenization)
  • js-tiktoken (OpenAI tokenization fallback)
  • uuid (request IDs)
  • PostgreSQL (audit logs)

Future Enhancements

  1. Provider Load Balancing - Route to cheapest provider for each task type
  2. Fallback Chain - If Gemini fails, auto-retry with Anthropic
  3. Model Selection AI - ML model to choose best provider per request
  4. Cost Optimization Engine - Suggest provider swaps based on usage patterns
  5. Real-time Pricing Updates - Auto-sync pricing from provider APIs

Conclusion

The LLM-Agnostic Sanitization Layer with Provider Gateway transforms ChainAlign from a Gemini-dependent system into a flexible, modular platform that embraces the rapid evolution of LLM technology.

By centralizing provider-specific logic into adapters and maintaining strict interface contracts, ChainAlign ensures:

  • 🔄 Easy provider switching
  • 💰 Cost optimization
  • ⚡ Rapid model adoption
  • 🛡️ Consistent compliance

Status: Ready for post-demo implementation Estimated Timeline: 6-8 weeks (Phases 1-4) Team: 2-3 backend engineers


Document Version: 1.0 Last Updated: October 30, 2025 Owner: Architecture Team Status: 📋 FSD - Ready for Implementation Planning