Skip to main content

Privacy-Aware LLM Integration for M5.2 Headline Generation

Purpose: Secure, deterministic LLM prompting that respects data privacy while maintaining reasoning effectiveness

Integration: HeadlineGenerator (M5.2) ↔ Redaction Engine Service (Python)


Architecture

HeadlineGenerator.generateByLLM()

1. Redaction Service: /abstractize
- Convert entities to tokens
- Generate metadata

2. Redaction Service: /metadata-for-llm
- Create system prompt injection

3. AIManager.generate()
- Build prompt with metadata injection
- Call Gemini with redacted context

4. Response
- Assertion headline (using token names)
- Replace tokens back with originals if needed

Step-by-Step Implementation

Step 1: Identify Entities to Redact

In the HeadlineGenerator's buildAssertionPrompt() method, identify which entities need privacy protection:

const context = {
metric: 'Inventory',
location: 'Frankfurt', // REDACT: Location
product: 'Component X', // REDACT: Product
persona: 'Supply Chain Director', // OK: Generic role
decision_required: 'Approve action', // OK: Generic
};

Step 2: Call Redaction Service to Abstractize

async buildAssertionPrompt(anomaly, impact, persona, context) {
// 1. Identify entities that need redaction
const entitiesToRedact = [
{
name: context.location, // e.g., "Frankfurt"
type: 'LOCATION',
metadata: {
region: 'Europe',
climate: this.getClimatType(context.location),
sensitivity: 'low'
}
},
{
name: context.product, // e.g., "Component X"
type: 'PRODUCT',
metadata: {
category: this.getProductCategory(context.product),
margin_percent: this.getMargin(context.product),
key_drivers: await this.getProductDrivers(context.product),
sensitivity: 'medium'
}
}
];

// 2. Call redaction service to get tokens + metadata
const redactionResponse = await fetch(
'http://localhost:5000/abstractize',
{
method: 'POST',
body: JSON.stringify({
tenant_id: context.tenantId,
entities: entitiesToRedact,
mode: 'full' // Get tokens + metadata
})
}
);

const { tokens, metadata_injection } = await redactionResponse.json();

// 3. Use tokens in the query (not original names)
const redactedLocation = tokens['Frankfurt']; // [LOCATION_EUROPE]
const redactedProduct = tokens['Component X']; // [PRODUCT_INDUSTRIAL]

// 4. Build prompt with redacted context
const systemPrompt = `You are an executive communication specialist.
${metadata_injection}

Your task is to generate assertion headlines for business decisions.`;

const userPrompt = `
Generate assertion headlines for:
- Metric: ${context.metric} for ${redactedProduct}
- Location: ${redactedLocation}
- Baseline: ${anomaly.baseline}
- Observed: ${anomaly.value}
- Impact: $${impact.impact_dollars}
- Role: ${persona.name}

Generate 3 distinct assertions:
`;

return {
systemPrompt,
userPrompt,
tokens, // Keep for token replacement if needed
originalContext: context // For internal use only
};
}

Step 3: Call LLM with Protected Context

async generateByLLM(anomaly, impact, persona, context) {
// 1. Build prompt with redacted entities
const { systemPrompt, userPrompt, tokens } = await this.buildAssertionPrompt(
anomaly, impact, persona, context
);

// 2. Call Gemini with metadata injection
const response = await AIManager.generate({
prompt: userPrompt,
systemInstructions: systemPrompt,
model: 'gemini-pro',
temperature: 0.4,
max_tokens: 150
});

// 3. Parse assertions from response
const assertions = this.parseAssertions(response.text);

// 4. (OPTIONAL) Replace tokens back with originals if needed for display
// Only do this if headlines will be shown to authorized users
const expandedAssertions = assertions.map(assertion => {
let expanded = assertion;
for (const [original, token] of Object.entries(tokens)) {
expanded = expanded.replace(
new RegExp(token, 'g'),
original
);
}
return expanded;
});

return {
assertion: assertions[0], // Keep with tokens
alternatives: assertions.slice(1),
confidence: 0.78,
source: 'llm',
reasoning: `Generated for ${persona.name}`
};
}

Redaction Service API Reference

Endpoint 1: /abstractize - Entity-to-Token Conversion

POST /abstractize

Request:

{
"tenant_id": "chainalign-tenant-1",
"entities": [
{
"name": "Frankfurt",
"type": "LOCATION",
"metadata": {
"region": "Europe",
"climate": "Temperate Continental",
"industry_growth": "2.5% YoY",
"sensitivity": "low"
}
},
{
"name": "Component X",
"type": "PRODUCT",
"metadata": {
"category": "INDUSTRIAL_PART",
"price_tier": "Standard",
"margin_percent": 28,
"key_drivers": ["Raw Material Cost", "Labor", "Supplier Capacity"],
"seasonality": "Stable",
"sensitivity": "medium"
}
}
],
"mode": "full"
}

Response:

{
"tokens": {
"Frankfurt": "[LOCATION_EUROPE]",
"Component X": "[PRODUCT_INDUSTRIAL_PART]"
},
"metadata_injection": "CONTEXT DEFINITIONS:\n[LOCATION_EUROPE]: {\"region\": \"Europe\", \"climate\": \"Temperate Continental\", ...}\n[PRODUCT_INDUSTRIAL_PART]: {\"category\": \"INDUSTRIAL_PART\", \"margin_percent\": 28, ...}",
"entity_metadata": {
"[LOCATION_EUROPE]": {...},
"[PRODUCT_INDUSTRIAL_PART]": {...}
}
}

Modes:

  • full: Return tokens + metadata_injection + entity_metadata (for reference)
  • prompt: Return only metadata_injection + entity_tokens (for LLM prompt building)

Endpoint 2: /metadata-for-llm - System Prompt Injection

POST /metadata-for-llm

Request:

{
"tenant_id": "chainalign-tenant-1",
"entity_tokens": {
"[LOCATION_EUROPE]": {
"region": "Europe",
"climate": "Temperate Continental",
"industry_growth": "2.5% YoY",
"sensitivity": "low"
},
"[PRODUCT_INDUSTRIAL_PART]": {
"category": "INDUSTRIAL_PART",
"margin_percent": 28,
"key_drivers": ["Raw Material Cost", "Labor", "Supplier Capacity"],
"sensitivity": "medium"
}
}
}

Response:

{
"metadata_injection": "CONTEXT DEFINITIONS:\n[LOCATION_EUROPE]: {\"region\": \"Europe\", \"climate\": \"...\"}...",
"system_prompt_template": "You are an executive communication specialist focused on decision intelligence.\n\nCONTEXT DEFINITIONS:\n...\n\nYour task is to generate assertion headlines for business decisions."
}

Endpoint 3: /health - Service Health Check

GET /health

Response:

{
"status": "healthy",
"service": "redaction-engine",
"version": "2.0"
}

Data Privacy & Security Benefits

✅ What's Protected

  • Product names: Converted to [PRODUCT_CATEGORY_TIER]
  • Location names: Converted to [LOCATION_REGION]
  • Company names: Converted to [CLIENT_INDUSTRY_TIER]
  • Customer names: Converted to [CUSTOMER_INDUSTRY_SEGMENT]
  • Financial amounts: Optionally converted to [AMOUNT_RANGE]

✅ What's Preserved for LLM Reasoning

  • Product attributes: Category, tier, key drivers, margin, seasonality
  • Location context: Region, climate, industry metrics
  • Company context: Industry, size tier, regulatory environment
  • Financial context: Amount ranges, percentage impacts, cost factors

✅ What's Never Shared

  • Original product/location/company names
  • Specific customer identities
  • Exact financial figures (if using ranges)
  • Proprietary trade secrets
  • Sensitive business metrics

Integration Checklist for M5.2

When implementing HeadlineGenerator:

  • Import/require redaction service client
  • Identify which context entities need redaction (location, product, customer, etc)
  • Create entity metadata mapping (category, margin, drivers, etc)
  • Call /abstractize endpoint to get tokens
  • Build system prompt with metadata injection
  • Build user prompt with redacted tokens (not original names)
  • Call AIManager.generate() with protected context
  • Parse assertions from response
  • (Optional) Replace tokens back with original names for authorized display
  • Cache headline (with or without tokens based on policy)

Example: Full Flow for CFO Assertion

Scenario: Generate assertion headline for CFO about CapEx decision

Step 1: Input Data

const context = {
tenantId: 'acme-corp',
product: 'NextGen Manufacturing Line', // PRIVATE
location: 'Singapore Production Facility', // PRIVATE
capex_amount: 15000000, // AMOUNT
roi: 0.18, // Percentage is OK
hurdle_rate: 0.15,
persona: 'CFO'
};

Step 2: Abstractize (Call Redaction Service)

const entities = [
{
name: 'NextGen Manufacturing Line',
type: 'PRODUCT',
metadata: {
category: 'MANUFACTURING_EQUIPMENT',
price_tier: 'Premium',
key_drivers: ['Labor Savings', 'Production Speed', 'Quality'],
sensitivity: 'high'
}
},
{
name: 'Singapore Production Facility',
type: 'LOCATION',
metadata: {
region: 'Southeast Asia',
cost_tier: 'Medium',
labor_availability: 'High',
sensitivity: 'medium'
}
}
];

const redacted = await redactionService.abstractize(entities);
// Returns:
// tokens: {
// 'NextGen Manufacturing Line': '[PRODUCT_MANUFACTURING_EQUIPMENT_PREMIUM]',
// 'Singapore Production Facility': '[LOCATION_SOUTHEAST_ASIA]'
// }

Step 3: Build System Prompt

CONTEXT DEFINITIONS:
[PRODUCT_MANUFACTURING_EQUIPMENT_PREMIUM]: {"category": "MANUFACTURING_EQUIPMENT", "price_tier": "Premium", "key_drivers": ["Labor Savings", "Production Speed", "Quality"], "sensitivity": "high"}
[LOCATION_SOUTHEAST_ASIA]: {"region": "Southeast Asia", "cost_tier": "Medium", "labor_availability": "High"}

You are an executive communication specialist. Generate 3 assertion headlines for a CFO evaluating capital expenditure decisions.

Step 4: Build User Prompt

CapEx Decision: [PRODUCT_MANUFACTURING_EQUIPMENT_PREMIUM] in [LOCATION_SOUTHEAST_ASIA]

Investment: $15M
Expected ROI: 18% (vs 15% hurdle rate)
Key Benefits: Labor savings, production speed, quality improvement

Generate 3 distinct assertion headlines:

Step 5: Call Gemini LLM sees tokens + metadata, generates assertions using category information

Step 6: Result

Assertion 1: "CapEx of $15M is justified by an 18% ROI, exceeding the 15% hurdle rate by 300 basis points"

Assertion 2: "Approve the manufacturing equipment investment in Southeast Asia to capture labor savings and speed improvements"

Assertion 3: "The $15M investment in equipment is a clear go, returning 300 basis points above cost of capital"

For Data Controllers & Compliance

GDPR Compliance

  • ✅ Entity names never sent to external LLM
  • ✅ Metadata is categorical (no PII)
  • ✅ Mapping kept in secure service, not in logs
  • ✅ Metadata injection is deterministic and auditable

Data Minimization

  • ✅ Only necessary attributes in metadata
  • ✅ Redaction happens before LLM call
  • ✅ Original data never leaves the redaction service

Audit Trail

  • Each abstractization call is logged
  • Entity tokens are deterministic (same entity = same token)
  • Metadata injection is versioned per tenant

Deployment

Prerequisites

  • Python 3.8+
  • Flask
  • Running at: http://localhost:5000 (or configure URL in HeadlineGenerator)

Docker

docker build -t redaction-engine:2.0 \
python-services/redaction-engine-service/

docker run -p 5000:5000 redaction-engine:2.0

Health Check

curl http://localhost:5000/health
# {"status": "healthy", "service": "redaction-engine", "version": "2.0"}

Testing

Test Abstractization

curl -X POST http://localhost:5000/abstractize \
-H "Content-Type: application/json" \
-d '{
"tenant_id": "test-tenant",
"entities": [
{"name": "Miami", "type": "LOCATION", "metadata": {"region": "Florida", "climate": "Subtropical"}}
],
"mode": "full"
}'

Test Metadata Injection

curl -X POST http://localhost:5000/metadata-for-llm \
-H "Content-Type: application/json" \
-d '{
"tenant_id": "test-tenant",
"entity_tokens": {
"[LOCATION_FLORIDA]": {"region": "Florida", "climate": "Subtropical"}
}
}'

Next Steps for M5.2

  1. Implement HeadlineGenerator.buildAssertionPrompt()

    • Call /abstractize for context entities
    • Build system prompt with metadata injection
    • Build user prompt with redacted tokens
  2. Implement HeadlineGenerator.generateByLLM()

    • Use protected system/user prompts
    • Parse assertions from response
    • Return with token names (or expand if needed)
  3. Testing

    • Unit test: redaction service endpoints
    • Integration test: full headline generation flow
    • Security test: verify no private data in logs

Status: Ready for M5.2 Implementation Service Version: 2.0 Endpoints: 3 (abstractize, metadata-for-llm, health) Privacy Level: High (no private data to external LLM)