Privacy-Aware LLM Integration for M5.2 Headline Generation
Purpose: Secure, deterministic LLM prompting that respects data privacy while maintaining reasoning effectiveness
Integration: HeadlineGenerator (M5.2) ↔ Redaction Engine Service (Python)
Architecture
HeadlineGenerator.generateByLLM()
↓
1. Redaction Service: /abstractize
- Convert entities to tokens
- Generate metadata
↓
2. Redaction Service: /metadata-for-llm
- Create system prompt injection
↓
3. AIManager.generate()
- Build prompt with metadata injection
- Call Gemini with redacted context
↓
4. Response
- Assertion headline (using token names)
- Replace tokens back with originals if needed
Step-by-Step Implementation
Step 1: Identify Entities to Redact
In the HeadlineGenerator's buildAssertionPrompt() method, identify which entities need privacy protection:
const context = {
metric: 'Inventory',
location: 'Frankfurt', // REDACT: Location
product: 'Component X', // REDACT: Product
persona: 'Supply Chain Director', // OK: Generic role
decision_required: 'Approve action', // OK: Generic
};
Step 2: Call Redaction Service to Abstractize
async buildAssertionPrompt(anomaly, impact, persona, context) {
// 1. Identify entities that need redaction
const entitiesToRedact = [
{
name: context.location, // e.g., "Frankfurt"
type: 'LOCATION',
metadata: {
region: 'Europe',
climate: this.getClimatType(context.location),
sensitivity: 'low'
}
},
{
name: context.product, // e.g., "Component X"
type: 'PRODUCT',
metadata: {
category: this.getProductCategory(context.product),
margin_percent: this.getMargin(context.product),
key_drivers: await this.getProductDrivers(context.product),
sensitivity: 'medium'
}
}
];
// 2. Call redaction service to get tokens + metadata
const redactionResponse = await fetch(
'http://localhost:5000/abstractize',
{
method: 'POST',
body: JSON.stringify({
tenant_id: context.tenantId,
entities: entitiesToRedact,
mode: 'full' // Get tokens + metadata
})
}
);
const { tokens, metadata_injection } = await redactionResponse.json();
// 3. Use tokens in the query (not original names)
const redactedLocation = tokens['Frankfurt']; // [LOCATION_EUROPE]
const redactedProduct = tokens['Component X']; // [PRODUCT_INDUSTRIAL]
// 4. Build prompt with redacted context
const systemPrompt = `You are an executive communication specialist.
${metadata_injection}
Your task is to generate assertion headlines for business decisions.`;
const userPrompt = `
Generate assertion headlines for:
- Metric: ${context.metric} for ${redactedProduct}
- Location: ${redactedLocation}
- Baseline: ${anomaly.baseline}
- Observed: ${anomaly.value}
- Impact: $${impact.impact_dollars}
- Role: ${persona.name}
Generate 3 distinct assertions:
`;
return {
systemPrompt,
userPrompt,
tokens, // Keep for token replacement if needed
originalContext: context // For internal use only
};
}
Step 3: Call LLM with Protected Context
async generateByLLM(anomaly, impact, persona, context) {
// 1. Build prompt with redacted entities
const { systemPrompt, userPrompt, tokens } = await this.buildAssertionPrompt(
anomaly, impact, persona, context
);
// 2. Call Gemini with metadata injection
const response = await AIManager.generate({
prompt: userPrompt,
systemInstructions: systemPrompt,
model: 'gemini-pro',
temperature: 0.4,
max_tokens: 150
});
// 3. Parse assertions from response
const assertions = this.parseAssertions(response.text);
// 4. (OPTIONAL) Replace tokens back with originals if needed for display
// Only do this if headlines will be shown to authorized users
const expandedAssertions = assertions.map(assertion => {
let expanded = assertion;
for (const [original, token] of Object.entries(tokens)) {
expanded = expanded.replace(
new RegExp(token, 'g'),
original
);
}
return expanded;
});
return {
assertion: assertions[0], // Keep with tokens
alternatives: assertions.slice(1),
confidence: 0.78,
source: 'llm',
reasoning: `Generated for ${persona.name}`
};
}
Redaction Service API Reference
Endpoint 1: /abstractize - Entity-to-Token Conversion
POST /abstractize
Request:
{
"tenant_id": "chainalign-tenant-1",
"entities": [
{
"name": "Frankfurt",
"type": "LOCATION",
"metadata": {
"region": "Europe",
"climate": "Temperate Continental",
"industry_growth": "2.5% YoY",
"sensitivity": "low"
}
},
{
"name": "Component X",
"type": "PRODUCT",
"metadata": {
"category": "INDUSTRIAL_PART",
"price_tier": "Standard",
"margin_percent": 28,
"key_drivers": ["Raw Material Cost", "Labor", "Supplier Capacity"],
"seasonality": "Stable",
"sensitivity": "medium"
}
}
],
"mode": "full"
}
Response:
{
"tokens": {
"Frankfurt": "[LOCATION_EUROPE]",
"Component X": "[PRODUCT_INDUSTRIAL_PART]"
},
"metadata_injection": "CONTEXT DEFINITIONS:\n[LOCATION_EUROPE]: {\"region\": \"Europe\", \"climate\": \"Temperate Continental\", ...}\n[PRODUCT_INDUSTRIAL_PART]: {\"category\": \"INDUSTRIAL_PART\", \"margin_percent\": 28, ...}",
"entity_metadata": {
"[LOCATION_EUROPE]": {...},
"[PRODUCT_INDUSTRIAL_PART]": {...}
}
}
Modes:
full: Return tokens + metadata_injection + entity_metadata (for reference)prompt: Return only metadata_injection + entity_tokens (for LLM prompt building)
Endpoint 2: /metadata-for-llm - System Prompt Injection
POST /metadata-for-llm
Request:
{
"tenant_id": "chainalign-tenant-1",
"entity_tokens": {
"[LOCATION_EUROPE]": {
"region": "Europe",
"climate": "Temperate Continental",
"industry_growth": "2.5% YoY",
"sensitivity": "low"
},
"[PRODUCT_INDUSTRIAL_PART]": {
"category": "INDUSTRIAL_PART",
"margin_percent": 28,
"key_drivers": ["Raw Material Cost", "Labor", "Supplier Capacity"],
"sensitivity": "medium"
}
}
}
Response:
{
"metadata_injection": "CONTEXT DEFINITIONS:\n[LOCATION_EUROPE]: {\"region\": \"Europe\", \"climate\": \"...\"}...",
"system_prompt_template": "You are an executive communication specialist focused on decision intelligence.\n\nCONTEXT DEFINITIONS:\n...\n\nYour task is to generate assertion headlines for business decisions."
}
Endpoint 3: /health - Service Health Check
GET /health
Response:
{
"status": "healthy",
"service": "redaction-engine",
"version": "2.0"
}
Data Privacy & Security Benefits
✅ What's Protected
- Product names: Converted to
[PRODUCT_CATEGORY_TIER] - Location names: Converted to
[LOCATION_REGION] - Company names: Converted to
[CLIENT_INDUSTRY_TIER] - Customer names: Converted to
[CUSTOMER_INDUSTRY_SEGMENT] - Financial amounts: Optionally converted to
[AMOUNT_RANGE]
✅ What's Preserved for LLM Reasoning
- Product attributes: Category, tier, key drivers, margin, seasonality
- Location context: Region, climate, industry metrics
- Company context: Industry, size tier, regulatory environment
- Financial context: Amount ranges, percentage impacts, cost factors
✅ What's Never Shared
- Original product/location/company names
- Specific customer identities
- Exact financial figures (if using ranges)
- Proprietary trade secrets
- Sensitive business metrics
Integration Checklist for M5.2
When implementing HeadlineGenerator:
- Import/require redaction service client
- Identify which context entities need redaction (location, product, customer, etc)
- Create entity metadata mapping (category, margin, drivers, etc)
- Call
/abstractizeendpoint to get tokens - Build system prompt with metadata injection
- Build user prompt with redacted tokens (not original names)
- Call AIManager.generate() with protected context
- Parse assertions from response
- (Optional) Replace tokens back with original names for authorized display
- Cache headline (with or without tokens based on policy)
Example: Full Flow for CFO Assertion
Scenario: Generate assertion headline for CFO about CapEx decision
Step 1: Input Data
const context = {
tenantId: 'acme-corp',
product: 'NextGen Manufacturing Line', // PRIVATE
location: 'Singapore Production Facility', // PRIVATE
capex_amount: 15000000, // AMOUNT
roi: 0.18, // Percentage is OK
hurdle_rate: 0.15,
persona: 'CFO'
};
Step 2: Abstractize (Call Redaction Service)
const entities = [
{
name: 'NextGen Manufacturing Line',
type: 'PRODUCT',
metadata: {
category: 'MANUFACTURING_EQUIPMENT',
price_tier: 'Premium',
key_drivers: ['Labor Savings', 'Production Speed', 'Quality'],
sensitivity: 'high'
}
},
{
name: 'Singapore Production Facility',
type: 'LOCATION',
metadata: {
region: 'Southeast Asia',
cost_tier: 'Medium',
labor_availability: 'High',
sensitivity: 'medium'
}
}
];
const redacted = await redactionService.abstractize(entities);
// Returns:
// tokens: {
// 'NextGen Manufacturing Line': '[PRODUCT_MANUFACTURING_EQUIPMENT_PREMIUM]',
// 'Singapore Production Facility': '[LOCATION_SOUTHEAST_ASIA]'
// }
Step 3: Build System Prompt
CONTEXT DEFINITIONS:
[PRODUCT_MANUFACTURING_EQUIPMENT_PREMIUM]: {"category": "MANUFACTURING_EQUIPMENT", "price_tier": "Premium", "key_drivers": ["Labor Savings", "Production Speed", "Quality"], "sensitivity": "high"}
[LOCATION_SOUTHEAST_ASIA]: {"region": "Southeast Asia", "cost_tier": "Medium", "labor_availability": "High"}
You are an executive communication specialist. Generate 3 assertion headlines for a CFO evaluating capital expenditure decisions.
Step 4: Build User Prompt
CapEx Decision: [PRODUCT_MANUFACTURING_EQUIPMENT_PREMIUM] in [LOCATION_SOUTHEAST_ASIA]
Investment: $15M
Expected ROI: 18% (vs 15% hurdle rate)
Key Benefits: Labor savings, production speed, quality improvement
Generate 3 distinct assertion headlines:
Step 5: Call Gemini LLM sees tokens + metadata, generates assertions using category information
Step 6: Result
Assertion 1: "CapEx of $15M is justified by an 18% ROI, exceeding the 15% hurdle rate by 300 basis points"
Assertion 2: "Approve the manufacturing equipment investment in Southeast Asia to capture labor savings and speed improvements"
Assertion 3: "The $15M investment in equipment is a clear go, returning 300 basis points above cost of capital"
For Data Controllers & Compliance
GDPR Compliance
- ✅ Entity names never sent to external LLM
- ✅ Metadata is categorical (no PII)
- ✅ Mapping kept in secure service, not in logs
- ✅ Metadata injection is deterministic and auditable
Data Minimization
- ✅ Only necessary attributes in metadata
- ✅ Redaction happens before LLM call
- ✅ Original data never leaves the redaction service
Audit Trail
- Each abstractization call is logged
- Entity tokens are deterministic (same entity = same token)
- Metadata injection is versioned per tenant
Deployment
Prerequisites
- Python 3.8+
- Flask
- Running at:
http://localhost:5000(or configure URL in HeadlineGenerator)
Docker
docker build -t redaction-engine:2.0 \
python-services/redaction-engine-service/
docker run -p 5000:5000 redaction-engine:2.0
Health Check
curl http://localhost:5000/health
# {"status": "healthy", "service": "redaction-engine", "version": "2.0"}
Testing
Test Abstractization
curl -X POST http://localhost:5000/abstractize \
-H "Content-Type: application/json" \
-d '{
"tenant_id": "test-tenant",
"entities": [
{"name": "Miami", "type": "LOCATION", "metadata": {"region": "Florida", "climate": "Subtropical"}}
],
"mode": "full"
}'
Test Metadata Injection
curl -X POST http://localhost:5000/metadata-for-llm \
-H "Content-Type: application/json" \
-d '{
"tenant_id": "test-tenant",
"entity_tokens": {
"[LOCATION_FLORIDA]": {"region": "Florida", "climate": "Subtropical"}
}
}'
Next Steps for M5.2
-
Implement HeadlineGenerator.buildAssertionPrompt()
- Call /abstractize for context entities
- Build system prompt with metadata injection
- Build user prompt with redacted tokens
-
Implement HeadlineGenerator.generateByLLM()
- Use protected system/user prompts
- Parse assertions from response
- Return with token names (or expand if needed)
-
Testing
- Unit test: redaction service endpoints
- Integration test: full headline generation flow
- Security test: verify no private data in logs
Status: Ready for M5.2 Implementation Service Version: 2.0 Endpoints: 3 (abstractize, metadata-for-llm, health) Privacy Level: High (no private data to external LLM)