MILESTONE 5.2 - Phase 2: Headline Generation Engine - KICKOFF
Status: 🚀 Ready to Start Estimated Duration: 2-3 weeks Foundation: M5.1 Phase 1 Complete Dependencies: PersonaService, ImpactQuantifier, AnomalyDetector
Vision
Transform raw data into decision-focused assertion headlines that move from descriptive ("what is it") to actionable ("what does it mean and what should you do").
The Core Shift:
❌ BEFORE (Descriptive): "Monthly Inventory Summary"
✅ AFTER (Assertion): "Excess inventory of $2.3M requires suspension of orders for 60 days"
Headlines follow the Action Title Formula:
Insight/Conclusion + Quantification + Implication/So What
Examples:
- VP Supply Chain: "Excess raw material inventory of $2.3M requires suspension of Component X orders for 60 days"
- CFO: "CapEx of $15M is justified by an 18% ROI, exceeding the 15% hurdle rate"
- Demand Planner: "Forecast bias +3% in Frankfurt suggests model issue, requiring recalibration this week"
Example: Same Data, Different Assertions
Data: Frankfurt warehouse inventory +7,250 units (23% above optimal), $2.3M working capital impact
Assertion for Supply Chain Director:
"Frankfurt inventory 23% above target is costing $2.3M in working capital and must be corrected in 60 days"
Assertion for CFO:
"Approve the $2.3M working capital facility now to absorb excess inventory and maintain liquidity"
Assertion for Demand Planner:
"Forecast bias of +3% in Frankfurt requires model retraining before next planning cycle"
Assertion for S&OP Executive:
"Inventory-profit trade-off decision needed: Accept $2.3M working capital drag for 1.2% service level improvement"
Architecture
Three-Component System
Anomaly Detector Impact Quantifier Headline Generator
↓ ↓ ↓
Find outliers → Quantify effect → Generate text
(2+ std dev) (cost, time, volume) (LLM + rules)
Data Flow
Page Generation Request
↓
Get User Persona Profile
↓
Query Recent Data
↓
[1] AnomalyDetector: Find anomalies
- Statistical detection (2+ std dev)
- Filter by relevance to persona
- Return anomalies with confidence
↓
[2] ImpactQuantifier: Calculate impact
- Monetary impact ($)
- Percentage impact (%)
- Timeline (days, weeks, months)
- Volume impact (units, SKUs)
↓
[3] HeadlineGenerator: Create text
- LLM for complex narratives
- Rules for standard cases
- Cache results for similar users
- Return headline + supporting facts
↓
Return headline ready for page render
Component 1: AnomalyDetector Service
Purpose
Identify statistically significant deviations from baseline that matter to a persona.
Key Methods
detectAnomalies(dataPoints, windowSize = 30)
/**
* Detect anomalies using statistical methods
* @param {Array<{date, value}>} dataPoints - Time series data
* @param {number} windowSize - Rolling window for baseline (default 30)
* @returns {Array<{date, value, baseline, zscore, confidence}>}
*/
async detectAnomalies(dataPoints, windowSize = 30) {
// 1. Calculate rolling mean and std dev
const windows = slidingWindow(dataPoints, windowSize);
const stats = windows.map(w => ({
mean: mean(w),
std: stdDev(w)
}));
// 2. Find points beyond 2 std dev
const anomalies = [];
for (let i = windowSize; i < dataPoints.length; i++) {
const z = Math.abs((dataPoints[i].value - stats[i].mean) / stats[i].std);
if (z > 2) {
anomalies.push({
date: dataPoints[i].date,
value: dataPoints[i].value,
baseline: stats[i].mean,
zscore: z,
confidence: this.zscoreToConfidence(z) // 95%, 99%, etc.
});
}
}
return anomalies;
}
filterByPersonaRelevance(anomalies, persona, context)
/**
* Filter anomalies to only those relevant to this persona
* @param {Array} anomalies - Detected anomalies
* @param {Object} persona - User's persona definition
* @param {Object} context - Request context {metric, location, product, etc}
* @returns {Array} Relevant anomalies
*/
async filterByPersonaRelevance(anomalies, persona, context) {
// 1. Filter by persona's key metrics
let relevant = anomalies.filter(a =>
persona.key_metrics.some(m => m.id === context.metric_id)
);
// 2. Filter by magnitude vs persona sensitivity
// Supply Chain Director cares about small changes
// CFO only cares about big financial impacts
relevant = relevant.filter(a => {
const magnitude = Math.abs(a.zscore);
return magnitude >= persona.anomaly_sensitivity_threshold;
});
// 3. Prioritize by persona context factors
relevant = relevant.sort((a, b) => {
const aRelevance = persona.context_factors.includes(a.context) ? 1 : 0;
const bRelevance = persona.context_factors.includes(b.context) ? 1 : 0;
return bRelevance - aRelevance;
});
return relevant;
}
Integration Points
- Called by HeadlineGenerator
- Uses dataPoints from M3 data sources (historical demand, weather, policy events)
- Filters based on persona definition
Component 2: ImpactQuantifier Service
Purpose
Convert anomalies into business impact (dollars, percentages, timeline).
Key Methods
quantifyMonetaryImpact(anomaly, context)
/**
* Calculate financial impact of an anomaly
* @param {Object} anomaly - Detected anomaly
* @param {Object} context - {product, location, margin, etc}
* @returns {Object} {impact_dollars, impact_percentage, confidence}
*/
async quantifyMonetaryImpact(anomaly, context) {
// 1. Calculate volume impact
const volumeChange = anomaly.value - anomaly.baseline;
// 2. Get unit economics (margin, cost, etc)
const unitMargin = await this.getUnitMargin(context);
// 3. Calculate financial impact
const impactDollars = volumeChange * unitMargin;
const baselineRevenue = anomaly.baseline * unitMargin;
const impactPercentage = (impactDollars / baselineRevenue) * 100;
return {
impact_dollars: Math.round(impactDollars),
impact_percentage: Math.round(impactPercentage * 10) / 10,
volume_change: volumeChange,
confidence: anomaly.confidence
};
}
quantifyTimelineImpact(anomaly, context)
/**
* Estimate how long the impact will last
* @param {Object} anomaly - Anomaly
* @param {Object} context - Request context
* @returns {Object} {duration_days, status}
*/
async quantifyTimelineImpact(anomaly, context) {
// 1. Look for related policy/weather events
const relatedEvents = await this.findRelatedExternalEvents(
anomaly.date,
context
);
// 2. Use historical patterns for similar events
if (relatedEvents.length > 0) {
const historicalDurations = await this.getHistoricalDurations(
relatedEvents
);
const avgDuration = mean(historicalDurations);
return {
duration_days: avgDuration,
duration_confidence: 0.85,
explanation: `Similar ${relatedEvents[0].type} resolved in ${avgDuration} days`
};
}
// 3. Default heuristic
return {
duration_days: 7, // 1 week default
duration_confidence: 0.5,
explanation: "Historical pattern unclear"
};
}
quantifyAction(anomaly, action, context)
/**
* Estimate impact of a recommended action
* @param {Object} anomaly - Anomaly context
* @param {Object} action - Recommended action
* @param {Object} context - Business context
* @returns {Object} {action_impact, timeline_to_effect}
*/
async quantifyAction(anomaly, action, context) {
// Look up historical outcomes for similar actions
const similarActions = await db('action_outcomes')
.where({
action_type: action.type,
anomaly_cause: anomaly.cause
})
.where('created_at', '>=', moment().subtract(2, 'years'))
.select();
if (similarActions.length > 0) {
return {
expected_impact: mean(similarActions.map(a => a.impact)),
implementation_days: median(similarActions.map(a => a.duration_days)),
success_rate: similarActions.filter(a => a.successful).length / similarActions.length,
confidence: Math.min(0.95, 0.5 + (similarActions.length / 100))
};
}
return {
expected_impact: null,
implementation_days: null,
success_rate: null,
confidence: 0.0,
note: "No historical data for this action type"
};
}
Integration Points
- Called by HeadlineGenerator
- Uses historical data from database
- Links to external data (M3.1, M3.2) for causal factors
- Provides confidence scores for uncertainty
Component 3: HeadlineGenerator Service
Purpose
Create assertion-based headlines that move from description to action. Every headline answers:
- What is the insight? (Conclusion)
- How much/when? (Quantification)
- What should happen next? (Implication)
Key Methods
generateAssertion(anomaly, impact, persona, context)
/**
* Generate a persona-specific assertion headline
* @param {Object} anomaly - Detected anomaly
* @param {Object} impact - Quantified impact (dollars, percentage, timeline)
* @param {Object} persona - User's persona with decision authority
* @param {Object} context - {metric, location, product, baseline, decision_needed}
* @returns {Object} {assertion, reasoning, confidence, suggested_action}
*/
async generateAssertion(anomaly, impact, persona, context) {
// 1. Check cache (same anomaly recently generated for this persona)
const cached = await this.checkHeadlineCache(
anomaly.id,
persona.id
);
if (cached && cached.recency < 3600000) { // 1 hour TTL
return cached.assertion;
}
// 2. Try rule-based generation first (fast, predictable)
const ruleAssertion = this.generateByRules(
anomaly,
impact,
persona,
context
);
if (ruleAssertion.confidence > 0.85) {
// Rules high confidence, use it
return ruleAssertion;
}
// 3. Fall back to LLM for complex decisions
const llmAssertion = await this.generateByLLM(
anomaly,
impact,
persona,
context
);
// 4. Cache and return
await this.cacheAssertion(anomaly.id, persona.id, llmAssertion);
return llmAssertion;
}
generateByRules(anomaly, impact, persona, context)
/**
* Rule-based assertion generation (fast, predictable)
* Uses the Insight + Quantification + Implication formula
*
* Assertion Formula: [INSIGHT] is [QUANTIFICATION], [IMPLICATION]
*/
generateByRules(anomaly, impact, persona, context) {
// Define assertion templates per persona
// Format: "{insight} is {quantification}, {implication}"
const templates = {
'supply_chain_director': {
excess_inventory: `{metric} of {quantification} is costing {impact_dollars} in working capital and must be corrected in {timeline_days} days`,
// Example: "Raw material inventory of $2.3M is costing $2.3M in working capital and must be corrected in 60 days"
capacity_constraint: `We are at {utilization_percent}% capacity with demand still {direction}, requiring immediate {decision}`,
// Example: "We are at 95% capacity with demand still rising, requiring immediate production increase"
},
'cfo': {
capex_justification: `CapEx of {amount} is justified by a {roi_percent}% ROI, exceeding the {hurdle_rate}% hurdle rate by {bps_above} bps`,
// Example: "CapEx of $15M is justified by an 18% ROI, exceeding the 15% hurdle rate by 300 bps"
working_capital_impact: `Approve {action} now to absorb {impact_dollars} in {metric} and maintain liquidity`,
// Example: "Approve the $2.3M working capital facility now to absorb excess inventory and maintain liquidity"
},
'demand_planner': {
forecast_bias: `Forecast bias of {bias_magnitude}% in {location} requires {action} before {deadline}`,
// Example: "Forecast bias of +3% in Frankfurt requires model retraining before next planning cycle"
model_accuracy: `{metric} accuracy degraded to {accuracy_percent}%, suggesting {root_cause} needs correction`,
// Example: "Forecast accuracy degraded to 92%, suggesting seasonal adjustment needs correction"
},
'sop_executive': {
tradeoff_decision: `{metric} trade-off decision needed: {option_a} vs. {option_b} for {impact_summary}`,
// Example: "Inventory-profit trade-off decision needed: Accept $2.3M working capital drag for 1.2% service level improvement"
plan_approval: `Approve {action} to resolve {issue} and deliver {benefit}`,
// Example: "Approve inventory reduction plan to resolve working capital exposure and deliver $2.3M cash"
}
};
// 1. Select template for persona and anomaly type
const personaTemplates = templates[persona.id] || templates['default'];
const anomalyType = this.classifyAnomaly(anomaly, context);
const template = personaTemplates[anomalyType] || Object.values(personaTemplates)[0];
// 2. Fill in variables
const filled = this.fillTemplate(template, {
metric: context.metric,
location: context.location,
direction: anomaly.value > anomaly.baseline ? 'rising' : 'falling',
quantification: this.formatQuantification(anomaly, impact),
impact_dollars: this.formatAmount(impact.impact_dollars),
impact_percentage: impact.impact_percentage.toFixed(1),
timeline_days: impact.duration_days,
deadline: this.formatDeadline(impact.duration_days),
utilization_percent: Math.round(anomaly.value / anomaly.baseline * 100),
amount: this.formatAmount(context.capex_amount),
roi_percent: (context.roi * 100).toFixed(0),
hurdle_rate: (context.hurdle_rate * 100).toFixed(0),
bps_above: Math.round((context.roi - context.hurdle_rate) * 10000),
bias_magnitude: Math.abs(anomaly.zscore).toFixed(1),
action: this.suggestedActionFor(persona, anomaly),
decision: this.requiredDecisionFor(persona, impact),
option_a: context.option_a,
option_b: context.option_b,
impact_summary: context.impact_summary,
benefit: context.expected_benefit,
issue: context.root_issue,
root_cause: context.root_cause_name,
accuracy_percent: Math.round((100 - Math.abs(anomaly.zscore)) * 10) / 10
});
// 3. Return with confidence
return {
assertion: filled,
confidence: 0.88, // Rule-based is highly confident
source: 'rules',
reasoning: `Matched assertion pattern for ${anomalyType} in ${persona.name}`
};
}
generateByLLM(anomaly, impact, persona, context)
/**
* LLM-based assertion generation for complex cases
* Generates 3 assertion options and selects the best
* Uses the Insight + Quantification + Implication formula
*/
async generateByLLM(anomaly, impact, persona, context) {
const prompt = this.buildAssertionPrompt(anomaly, impact, persona, context);
const response = await AIManager.generate({
prompt,
systemInstructions: `You are an executive communication specialist focused on decision intelligence.
Your task is to generate 3 distinct assertion headlines for a business executive.
CRITICAL RULES:
1. Move from descriptive to actionable (NOT "Monthly Inventory Update" but "Inventory spike requires immediate action")
2. Follow the Assertion Formula: [INSIGHT] + [QUANTIFICATION] + [IMPLICATION]
3. Each assertion must be a complete, grammatically correct sentence
4. Lead with the business conclusion, not the metric
5. Include specific numbers, percentages, and timeframes
6. Make it clear what decision is needed
7. Maximum 15 words per assertion
8. Use active voice and commanding language for ${persona.name}`,
model: 'gemini-pro',
temperature: 0.4, // Slightly higher for creative options
max_tokens: 150 // Space for 3 options
});
// Parse and select best assertion
const assertions = this.parseAssertions(response.text);
const best = this.selectBestAssertion(assertions, persona, impact);
return {
assertion: best.text,
alternatives: assertions.filter(a => a !== best),
confidence: 0.78,
source: 'llm',
reasoning: `Selected best assertion for ${persona.name} (${persona.decision_focus})`
};
}
Supporting Methods
buildAssertionPrompt(anomaly, impact, persona, context)
buildAssertionPrompt(anomaly, impact, persona, context) {
return `
You are writing assertion headlines for: ${persona.name}
Decision Authority: ${persona.primary_authority}
Primary Focus: ${persona.decision_focus}
DATA ANOMALY:
- Metric: ${context.metric} in ${context.location}
- Baseline: ${anomaly.baseline.toFixed(0)}
- Observed: ${anomaly.value.toFixed(0)}
- Change: ${((anomaly.value - anomaly.baseline) / anomaly.baseline * 100).toFixed(1)}% (${anomaly.zscore.toFixed(2)} std dev)
- Confidence: ${(anomaly.confidence * 100).toFixed(0)}%
BUSINESS IMPACT:
- Financial Impact: $${Math.abs(impact.impact_dollars).toLocaleString()}
- Percentage Impact: ${impact.impact_percentage}%
- Timeline: ${impact.duration_days} days to resolve
DECISION NEEDED:
- ${context.decision_required || 'Approve / Adjust / Monitor action'}
- Success Measure: ${context.success_metric || 'Return to baseline'}
Generate 3 distinct assertion headlines using this formula:
[INSIGHT] + [QUANTIFICATION] + [IMPLICATION/ACTION]
Examples for ${persona.name}:
${this.getPersonaExamples(persona)}
Now generate 3 unique assertions (numbered 1, 2, 3):
`;
}
#### getPersonaExamples(persona)
```javascript
getPersonaExamples(persona) {
const examples = {
'supply_chain_director': `
1. "Excess inventory of $2.3M requires suspension of orders for 60 days"
2. "Lead time variance of 15% is constraining our ability to meet demand, necessitating supplier diversification"
3. "Warehouse utilization at 95% signals imminent capacity breach requiring immediate SKU rationalization"
`,
'cfo': `
1. "CapEx of $15M is justified by an 18% ROI, exceeding the 15% hurdle rate by 300 basis points"
2. "Working capital facility of $2.3M must be approved to absorb inventory buildup and maintain liquidity"
3. "Cost overrun of $1.2M requires budget reallocation to deliver full-year guidance"
`,
'demand_planner': `
1. "Forecast bias of +3% in Frankfurt requires model retraining before the next planning cycle"
2. "Demand variance has increased to 18% YoY, suggesting external factors need explicit modeling"
3. "Accuracy degradation from 96% to 91% indicates seasonal adjustment methodology needs revision"
`,
'sop_executive': `
1. "Inventory-profit trade-off decision needed: Accept $2.3M working capital drag for 1.2% service improvement"
2. "Demand surge is straining capacity; approve production ramp to avoid 5% backorder risk"
3. "Supply disruption threatens $500K in margin; immediate diversification investment of $200K is required"
`
};
return examples[persona.id] || examples['sop_executive'];
}
Privacy-Aware LLM Prompting: Contextual Abstraction
A critical concern: LLMs need context (product names, locations, company details) to reason effectively, but sharing these creates privacy/security risks.
Solution: Contextual Abstraction & Metadata Injection
Instead of sending actual names, send descriptive tokens with metadata:
Pattern: Redacted Identifier with Metadata
// BEFORE (Privacy Risk):
"Analyze sales of ACME SummerChill Max in Miami, FL"
// AFTER (Privacy Safe):
"Analyze sales of [PRODUCT_AIR_CONDITIONER_PREMIUM] in [LOCATION_SUBTROPICAL_COASTAL]"
// WITH METADATA INJECTED INTO SYSTEM PROMPT:
"The entity [PRODUCT_AIR_CONDITIONER_PREMIUM] is a high-margin, premium cooling unit
sold in hot, humid climates. It is highly seasonal (peaks Q2-Q3) and sensitive to
daily maximum temperature. Typical margin: 35%."
Implementation Strategy
1. Redaction Layer
- Map product names → category tokens:
[PRODUCT_AIR_CONDITIONER_PREMIUM] - Map locations → regional tokens:
[LOCATION_SUBTROPICAL_COASTAL] - Map companies → tier tokens:
[CLIENT_FINANCIAL_SERVICES_T1]
2. Metadata Injection (in System Prompt)
const metadata = {
'[PRODUCT_AIR_CONDITIONER_PREMIUM]': {
type: 'Appliance',
category: 'Cooling/Seasonal',
price_tier: 'High',
key_drivers: ['Weather Temperature', 'Energy Prices'],
margin_percent: 35,
seasonality: 'Q2-Q3 peak'
},
'[LOCATION_SUBTROPICAL_COASTAL]': {
climate: 'Hot & Humid',
key_variable: 'Daily Maximum Temperature',
region: 'US Southeast',
industry_growth: '5% YoY'
}
};
// Inject into system prompt BEFORE user query
const systemWithMetadata = `
${baseSystemPrompt}
CONTEXT DEFINITIONS:
${Object.entries(metadata).map(([token, data]) =>
`${token}: ${JSON.stringify(data)}`
).join('\n')}
`;
3. External Data Enrichment
- Public weather data: Tag with location type, don't send city name
- Public economic data: Use industry benchmarks, not company-specific
- Historical patterns: Use anonymized case studies
4. Query Redaction
const redactedQuery = `
Analyze assertion headline for:
- Role: [PERSONA_CFO]
- Context: [LOCATION_SUBTROPICAL_COASTAL] [PRODUCT_AIR_CONDITIONER_PREMIUM]
- Baseline Sales: ${anonymizedBaseline}
- Observed Sales: ${anonymizedObserved}
- Metric: Sales variance ${percentageChange}%
`;
Benefits
- ✅ LLM gets necessary context for reasoning
- ✅ Private data never leaves the system
- ✅ Deterministic (same token always = same metadata)
- ✅ Cacheable (same scenario generates same results)
- ✅ Auditable (log all metadata injections)
Optional: Fine-Tuned Local Models
For highest security, use internally fine-tuned models:
- Train on anonymized, internal case studies
- Maintain RAG database of public/safe external data
- Control exactly what context model sees
Implementation Checklist
Phase 2A: AnomalyDetector (Days 1-4)
- Create
backend/src/services/AnomalyDetector.js - Implement
detectAnomalies()with 2+ std dev logic - Implement
filterByPersonaRelevance() - Add unit tests for statistical methods
- Validate with M3 historical data
Phase 2B: ImpactQuantifier (Days 5-9)
- Create
backend/src/services/ImpactQuantifier.js - Implement
quantifyMonetaryImpact() - Implement
quantifyTimelineImpact()with historical lookups - Implement
quantifyAction()for recommendations - Link to unit economics database
Phase 2C: HeadlineGenerator (Days 10-14)
- Create
backend/src/services/HeadlineGenerator.js - Implement rule-based generation (fast path)
- Implement LLM-based generation (complex path)
- Add headline caching
- Create headline cache repository
Phase 2D: Integration & Testing (Days 15-21)
- Add routes: POST
/api/headlineswith full pipeline - Create integration tests for all three components
- Test with demo data from M4
- Measure performance (target: <1s per headline)
- Write final M5.2 summary
Database Changes Needed
New Headlines Cache Table
CREATE TABLE headline_cache (
id UUID PRIMARY KEY
tenant_id UUID
anomaly_id UUID
persona_id UUID
headline TEXT
confidence FLOAT
source VARCHAR(50) -- 'rules' or 'llm'
metadata JSONB
created_at TIMESTAMP
expires_at TIMESTAMP -- 7 day TTL
INDEX(tenant_id, persona_id)
INDEX(created_at)
)
Action Outcomes Table (for quantification learning)
CREATE TABLE action_outcomes (
id UUID PRIMARY KEY
tenant_id UUID
action_type VARCHAR(100)
anomaly_cause VARCHAR(100)
expected_impact DECIMAL
actual_impact DECIMAL
duration_days INTEGER
successful BOOLEAN
created_at TIMESTAMP
)
Success Metrics
Functional
- Anomalies detected with >90% accuracy on test data
- Impact quantified within ±10% of actual
- Headlines generated in <1 second
- 80%+ rule-based (fast), 20% LLM (complex)
Quality
- Headlines specific to persona (not generic)
- Headlines include metric + impact + timeframe
- Headlines are actionable (lead to decisions)
- Support team validates <5% error rate
Performance
- <1s latency for rule-based headlines
- <3s latency for LLM headlines
- Cache hit rate >70% for repeated anomalies
Connection to M5.3
M5.2 output (headlines) feeds directly into M5.3 (Hero Section):
M5.2 Output:
{
headline: "Frankfurt inventory 23% above optimal, costing $2.3M",
confidence: 0.92,
impact: {
dollars: 2300000,
percentage: 2.3,
days: 10
}
}
↓
M5.3 Input:
Build hero section with:
- Headline ↑
- Primary chart (waterfall for S&OP, P&L for CFO)
- Key metrics box (showing impact)
- Context (weather, policy, external factors)
Resources
Code Examples
- See
docs/MILESTONE_5_PHASE_1_SUMMARY.mdfor M5.1 structure - See
docs/ROLE_PERSONAS_FRAMEWORK.mdfor persona definitions
Dependencies
- M5.1: PersonaService (persona definitions)
- M3.1: Weather data and correlations
- M3.2: Policy events and economic data
- AIManager: Gemini API for LLM-based generation
Testing Data
- M4 demo data: 30 days of demand
- Weather data: 5 years from M3.1
- Policy events: 3 events from M4
Ready to Start
M5.1 foundation is complete. All prerequisites met:
- ✅ Personas defined and stored
- ✅ User profiles and interaction tracking
- ✅ Database schema ready
- ✅ API routes ready
Next Action: Start M5.2 Phase A - AnomalyDetector service
Prepared: 2025-10-22 Phase 1 Complete, Phase 2 Ready to Begin