Latency Optimization Strategy: Pre-Built Pages with Live Overlays
Problem Statement
ChainAlign's rich feature set (RAG, LLM, Monte Carlo simulations, constraint validation) creates latency challenges:
- RAG context retrieval: 500-1000ms
- LLM synthesis: 3-5 seconds (Gemini API roundtrip)
- Monte Carlo simulation: 20-30 seconds (10K iterations)
- Constraint validation: 200-500ms
- Total typical page load: 4-6 seconds ❌
Users experience slow, sluggish interface despite fast infrastructure.
Solution: Pre-Built Pages with Live Overlays
Architecture Overview
User Request
↓
┌─────────────────────────────────────────────────────────┐
│ Step 1: Serve from Cache (< 100ms) │
│ - Complete page structure with all content │
│ - Ready to display immediately │
│ - All RAG, LLM, Monte Carlo results pre-baked │
└─────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ Step 2: Fetch Live Overlay Asynchronously (< 200ms) │
│ - Data freshness indicators │
│ - Critical alerts/notifications │
│ - Real-time status of background tasks │
│ - User pending actions │
└─────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ Step 3: Merge on Frontend │
│ - Display cached page immediately │
│ - Update with overlay when it arrives │
│ - Show freshness badges next to data │
│ - Display critical alerts prominently │
└─────────────────────────────────────────────────────────┘
↓
User sees complete, fresh-looking page in < 200ms total
Key Components
1. PageBuildingService
Purpose: Pre-builds complete pages with all dynamic content
How it works:
- Orchestrates RAG, LLM, Monte Carlo, constraint validation in parallel
- Stores result in Redis cache with TTL
- Returns cached page on next request (instant!)
Cache strategy:
// Cache key: page:dashboard:tenant-123:userId=user-456&locationId=loc-789
// TTL: 5 minutes (configurable)
// On cache miss:
const page = await PageBuildingService.buildPage(
'dashboard',
tenantId,
{ userId, locationId },
300 // 5-minute TTL
);
// Response: {pageType, layout, cache_status: 'MISS', build_time_ms: 4200}
// On cache hit:
// Response: {pageType, layout, cache_status: 'HIT', build_time_ms: 0}
Parallel execution example:
const [
graphContext, // RAG: 800ms
decisionProblems, // DB query: 200ms
monteCarloResults, // Simulation: 25s → But parallel!
constraints // Validation: 300ms
] = await Promise.all([
RAGService.retrieveRelevantChunks(...),
DecisionProblemsRepository.findAll(...),
monteCarloService.runSimulation(...),
constraintValidationService.validate(...)
]);
// Total time: max(25s) instead of 25+0.8+0.2+0.3 = 26.3s
2. LiveOverlayService
Purpose: Adds live data to cached pages without rebuilding
Data provided:
- Freshness indicators - When each data source was last updated
- Critical alerts - High-priority issues user should know about
- Real-time status - Background task progress, queue status, etc.
Example freshness response:
{
"freshness": {
"scenarios": {
"source": "scenarios",
"last_updated": "2025-10-22T14:23:45Z",
"age_seconds": 240,
"age_display": "4 minutes ago",
"status": "recent"
},
"forecasts": {
"source": "forecasts",
"last_updated": "2025-10-21T08:00:00Z",
"age_seconds": 86400,
"age_display": "1 day ago",
"status": "stale"
},
"monte_carlo": {
"source": "simulation:scenario-123",
"status": "pending",
"message": "Running (45% complete)"
}
},
"alerts": [
{
"severity": "warning",
"type": "stale_forecast",
"message": "Demand forecast is >24 hours old",
"action_url": "/forecasts?action=refresh"
},
{
"severity": "action_required",
"type": "pending_approval",
"message": "2 decisions awaiting your approval",
"action_url": "/decisions?filter=pending"
}
],
"status": {
"services": {
"rag": "healthy",
"llm": "healthy",
"simulation_queue": {
"pending_count": 3,
"running_count": 1,
"total_capacity": 100
}
}
}
}
3. Cache Invalidation
When to invalidate:
- User makes a decision (decision_made event)
- New scenario created (scenario_created)
- Forecast refreshed (data_updated)
- Constraints changed (constraint_changed)
Implementation:
// After decision is recorded
await PageBuildingService.invalidateRelatedCaches(
'decision_made',
tenantId,
{ problemId, decisionId }
);
// Automatically invalidates: dashboard, what-if-workbench, scenarios
API Endpoints
Get Cached Page
GET /api/pages/:pageType?context={...}&cache_ttl=300&force_rebuild=false
Response: {
status: 'success',
data: { pageType, metadata, layout, ... },
cache_status: 'HIT' | 'MISS',
build_time_ms: 0 | 4200
}
Get Live Overlay
GET /api/pages/:pageType/overlay?context={...}
Response: {
status: 'success',
data: {
freshness: { ... },
alerts: [ ... ],
status: { ... }
}
}
Invalidate Cache
POST /api/pages/:pageType/invalidate
Body: { context: { ... } }
Invalidate Related Caches
POST /api/pages/invalidate-related
Body: {
event: 'decision_made' | 'scenario_created' | 'data_updated',
context: { ... }
}
Frontend Implementation Example
React Component Pattern
// DashboardPage.tsx
import { useEffect, useState } from 'react';
export function DashboardPage() {
const [page, setPage] = useState(null);
const [overlay, setOverlay] = useState(null);
const [loading, setLoading] = useState(true);
useEffect(() => {
// Step 1: Load cached page immediately
fetchPage();
// Step 2: Load overlay asynchronously
fetchOverlay();
}, []);
async function fetchPage() {
try {
const response = await fetch(
`/api/pages/dashboard?context=${JSON.stringify({
userId: currentUser.id,
locationId: selectedLocation
})}`
);
const { data } = await response.json();
setPage(data);
setLoading(false);
} catch (error) {
console.error('Failed to load page:', error);
setLoading(false);
}
}
async function fetchOverlay() {
try {
const response = await fetch(
`/api/pages/dashboard/overlay?context=${JSON.stringify({
userId: currentUser.id,
locationId: selectedLocation
})}`
);
const { data } = await response.json();
setOverlay(data);
} catch (error) {
console.error('Failed to load overlay:', error);
}
}
if (loading && !page) {
return <LoadingSpinner />;
}
return (
<div className="dashboard">
{/* Display cached page immediately */}
<PageRenderer page={page} />
{/* Overlay data updates asynchronously */}
{overlay && (
<div className="overlay-layer">
<FreshnessBadges freshness={overlay.freshness} />
<AlertsBanner alerts={overlay.alerts} />
<StatusIndicators status={overlay.status} />
</div>
)}
</div>
);
}
// Freshness display component
function FreshnessBadges({ freshness }) {
return (
<div className="freshness-badges">
{Object.entries(freshness).map(([key, data]) => (
<div key={key} className={`badge badge-${data.status}`}>
<span className="source">{key}</span>
<span className="age">{data.age_display}</span>
{data.status === 'stale' && (
<a href="#" className="refresh-link">Refresh</a>
)}
</div>
))}
</div>
);
}
// Alerts banner
function AlertsBanner({ alerts }) {
return (
<>
{alerts.map((alert) => (
<div key={alert.id} className={`alert alert-${alert.severity}`}>
<span>{alert.message}</span>
<a href={alert.action_url}>Take action</a>
</div>
))}
</>
);
}
Performance Metrics
Before (Dynamic Building)
| Metric | Time |
|---|---|
| Page load | 4-6 seconds |
| Time to interactive | 5-7 seconds |
| Time to first content | 500ms (blank screen) |
After (Cache + Overlay)
| Metric | Time |
|---|---|
| Page load | < 100ms (from cache) |
| Overlay load | < 200ms (async) |
| Time to interactive | < 200ms |
| Time to first content | < 100ms |
| Improvement | 40-60x faster |
Cache Strategy Parameters
Cache TTL by Page Type
| Page Type | Default TTL | Invalidation Trigger |
|---|---|---|
| Dashboard | 5 minutes | data_updated, decision_made |
| What-if Workbench | 10 minutes | scenario_created, constraint_changed |
| Scenarios | 5 minutes | scenario_created, decision_made |
| Insights | 1 hour | forecast_updated |
Cache Key Components
page:{pageType}:{tenantId}:{context_params}
Example:
page:dashboard:tenant-123:locationId=loc-456&userId=user-789
page:what-if-workbench:tenant-123:problemId=prob-012&scenarioId=scen-345
Cost Savings
Computation Reduction
By serving cached pages, we reduce:
- RAG calls: 50-70% fewer queries (shared across users, time window)
- LLM calls: 30-50% fewer synthesis requests (batch pre-building)
- Simulations: 40-60% fewer runs (shared results for same scenario)
Estimated Savings
Scenario: 1000 monthly active users, 5 page views each/day
- Before: 5000 page loads × (RAG + LLM + Monte Carlo) = expensive
- After: Cache hit ratio 70% → 3500 page loads from cache, 1500 dynamic
LLM cost reduction: ~60-70% RAG call reduction: ~50-70% Total infrastructure cost savings: 30-40%
Rollout Plan
Phase 1: Dashboard (Week 1)
- Implement PageBuildingService
- Implement LiveOverlayService
- Create dashboard caching route
- Test with 10% of users
Phase 2: Workbench + Scenarios (Week 2)
- Expand to what-if-workbench
- Add scenarios page caching
- Expand to 50% of users
Phase 3: Insights + Monitoring (Week 3)
- Add insights page caching
- Implement detailed cache metrics
- Full rollout to 100% of users
Phase 4: Optimization (Week 4)
- Fine-tune TTL parameters based on usage
- Implement predictive pre-building (pre-build before user requests)
- Monitor cache hit ratios and adjust
Monitoring & Alerting
Key Metrics
- Cache hit ratio (should be > 80%)
- Average page build time (should be < 100ms for cache hits)
- Overlay fetch time (should be < 200ms)
- Cache size (Redis memory usage)
- Invalidation frequency (should be < 2 per second)
Alerts to Set Up
- Cache hit ratio drops below 60%
- Page build time exceeds 500ms
- Redis memory usage > 80%
- Overlay fetch time exceeds 500ms
Edge Cases & Handling
Multi-user Consistency
Problem: User A modifies forecast, User B still sees cached old version
Solution: Invalidate related caches immediately on write
// After forecast update
await PageBuildingService.invalidateRelatedCaches(
'forecast_updated',
tenantId
// No context specified = invalidate ALL forecast-related caches
);
Fresh vs. Recent vs. Stale
Fresh: < 5 minutes old
Recent: 5-60 minutes old
Stale: > 60 minutes old
Display different indicators/colors for each
Offer "refresh" action for stale data
Simulation Results During Computation
Problem: Cached page shows old Monte Carlo results while simulation runs
Solution: Overlay shows simulation status
{
"freshness": {
"monte_carlo": {
"status": "pending",
"message": "New simulation running (45% complete)",
"previous_result_stale": true
}
}
}
Future Enhancements
-
Predictive Pre-Building
- Build pages before users request them
- Based on usage patterns and time of day
-
Incremental Updates
- Instead of full cache invalidation, update only changed sections
- Requires component-level caching
-
Personalized Caching
- Cache variants for different user roles
- Cache with different freshness for different user preferences
-
Smart TTL Adjustment
- Automatically adjust TTL based on data change frequency
- Higher frequency data = shorter TTL
-
Cache Warming
- Pre-build pages at off-peak hours
- Ensures warm cache during peak usage