Skip to main content

Latency Optimization Strategy: Pre-Built Pages with Live Overlays

Problem Statement

ChainAlign's rich feature set (RAG, LLM, Monte Carlo simulations, constraint validation) creates latency challenges:

  • RAG context retrieval: 500-1000ms
  • LLM synthesis: 3-5 seconds (Gemini API roundtrip)
  • Monte Carlo simulation: 20-30 seconds (10K iterations)
  • Constraint validation: 200-500ms
  • Total typical page load: 4-6 seconds ❌

Users experience slow, sluggish interface despite fast infrastructure.


Solution: Pre-Built Pages with Live Overlays

Architecture Overview

User Request

┌─────────────────────────────────────────────────────────┐
│ Step 1: Serve from Cache (< 100ms) │
│ - Complete page structure with all content │
│ - Ready to display immediately │
│ - All RAG, LLM, Monte Carlo results pre-baked │
└─────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────┐
│ Step 2: Fetch Live Overlay Asynchronously (< 200ms) │
│ - Data freshness indicators │
│ - Critical alerts/notifications │
│ - Real-time status of background tasks │
│ - User pending actions │
└─────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────┐
│ Step 3: Merge on Frontend │
│ - Display cached page immediately │
│ - Update with overlay when it arrives │
│ - Show freshness badges next to data │
│ - Display critical alerts prominently │
└─────────────────────────────────────────────────────────┘

User sees complete, fresh-looking page in < 200ms total

Key Components

1. PageBuildingService

Purpose: Pre-builds complete pages with all dynamic content

How it works:

  1. Orchestrates RAG, LLM, Monte Carlo, constraint validation in parallel
  2. Stores result in Redis cache with TTL
  3. Returns cached page on next request (instant!)

Cache strategy:

// Cache key: page:dashboard:tenant-123:userId=user-456&locationId=loc-789
// TTL: 5 minutes (configurable)

// On cache miss:
const page = await PageBuildingService.buildPage(
'dashboard',
tenantId,
{ userId, locationId },
300 // 5-minute TTL
);
// Response: {pageType, layout, cache_status: 'MISS', build_time_ms: 4200}

// On cache hit:
// Response: {pageType, layout, cache_status: 'HIT', build_time_ms: 0}

Parallel execution example:

const [
graphContext, // RAG: 800ms
decisionProblems, // DB query: 200ms
monteCarloResults, // Simulation: 25s → But parallel!
constraints // Validation: 300ms
] = await Promise.all([
RAGService.retrieveRelevantChunks(...),
DecisionProblemsRepository.findAll(...),
monteCarloService.runSimulation(...),
constraintValidationService.validate(...)
]);
// Total time: max(25s) instead of 25+0.8+0.2+0.3 = 26.3s

2. LiveOverlayService

Purpose: Adds live data to cached pages without rebuilding

Data provided:

  1. Freshness indicators - When each data source was last updated
  2. Critical alerts - High-priority issues user should know about
  3. Real-time status - Background task progress, queue status, etc.

Example freshness response:

{
"freshness": {
"scenarios": {
"source": "scenarios",
"last_updated": "2025-10-22T14:23:45Z",
"age_seconds": 240,
"age_display": "4 minutes ago",
"status": "recent"
},
"forecasts": {
"source": "forecasts",
"last_updated": "2025-10-21T08:00:00Z",
"age_seconds": 86400,
"age_display": "1 day ago",
"status": "stale"
},
"monte_carlo": {
"source": "simulation:scenario-123",
"status": "pending",
"message": "Running (45% complete)"
}
},
"alerts": [
{
"severity": "warning",
"type": "stale_forecast",
"message": "Demand forecast is >24 hours old",
"action_url": "/forecasts?action=refresh"
},
{
"severity": "action_required",
"type": "pending_approval",
"message": "2 decisions awaiting your approval",
"action_url": "/decisions?filter=pending"
}
],
"status": {
"services": {
"rag": "healthy",
"llm": "healthy",
"simulation_queue": {
"pending_count": 3,
"running_count": 1,
"total_capacity": 100
}
}
}
}

3. Cache Invalidation

When to invalidate:

  • User makes a decision (decision_made event)
  • New scenario created (scenario_created)
  • Forecast refreshed (data_updated)
  • Constraints changed (constraint_changed)

Implementation:

// After decision is recorded
await PageBuildingService.invalidateRelatedCaches(
'decision_made',
tenantId,
{ problemId, decisionId }
);
// Automatically invalidates: dashboard, what-if-workbench, scenarios

API Endpoints

Get Cached Page

GET /api/pages/:pageType?context={...}&cache_ttl=300&force_rebuild=false

Response: {
status: 'success',
data: { pageType, metadata, layout, ... },
cache_status: 'HIT' | 'MISS',
build_time_ms: 0 | 4200
}

Get Live Overlay

GET /api/pages/:pageType/overlay?context={...}

Response: {
status: 'success',
data: {
freshness: { ... },
alerts: [ ... ],
status: { ... }
}
}

Invalidate Cache

POST /api/pages/:pageType/invalidate
Body: { context: { ... } }
POST /api/pages/invalidate-related
Body: {
event: 'decision_made' | 'scenario_created' | 'data_updated',
context: { ... }
}

Frontend Implementation Example

React Component Pattern

// DashboardPage.tsx
import { useEffect, useState } from 'react';

export function DashboardPage() {
const [page, setPage] = useState(null);
const [overlay, setOverlay] = useState(null);
const [loading, setLoading] = useState(true);

useEffect(() => {
// Step 1: Load cached page immediately
fetchPage();

// Step 2: Load overlay asynchronously
fetchOverlay();
}, []);

async function fetchPage() {
try {
const response = await fetch(
`/api/pages/dashboard?context=${JSON.stringify({
userId: currentUser.id,
locationId: selectedLocation
})}`
);
const { data } = await response.json();
setPage(data);
setLoading(false);
} catch (error) {
console.error('Failed to load page:', error);
setLoading(false);
}
}

async function fetchOverlay() {
try {
const response = await fetch(
`/api/pages/dashboard/overlay?context=${JSON.stringify({
userId: currentUser.id,
locationId: selectedLocation
})}`
);
const { data } = await response.json();
setOverlay(data);
} catch (error) {
console.error('Failed to load overlay:', error);
}
}

if (loading && !page) {
return <LoadingSpinner />;
}

return (
<div className="dashboard">
{/* Display cached page immediately */}
<PageRenderer page={page} />

{/* Overlay data updates asynchronously */}
{overlay && (
<div className="overlay-layer">
<FreshnessBadges freshness={overlay.freshness} />
<AlertsBanner alerts={overlay.alerts} />
<StatusIndicators status={overlay.status} />
</div>
)}
</div>
);
}

// Freshness display component
function FreshnessBadges({ freshness }) {
return (
<div className="freshness-badges">
{Object.entries(freshness).map(([key, data]) => (
<div key={key} className={`badge badge-${data.status}`}>
<span className="source">{key}</span>
<span className="age">{data.age_display}</span>
{data.status === 'stale' && (
<a href="#" className="refresh-link">Refresh</a>
)}
</div>
))}
</div>
);
}

// Alerts banner
function AlertsBanner({ alerts }) {
return (
<>
{alerts.map((alert) => (
<div key={alert.id} className={`alert alert-${alert.severity}`}>
<span>{alert.message}</span>
<a href={alert.action_url}>Take action</a>
</div>
))}
</>
);
}

Performance Metrics

Before (Dynamic Building)

MetricTime
Page load4-6 seconds
Time to interactive5-7 seconds
Time to first content500ms (blank screen)

After (Cache + Overlay)

MetricTime
Page load< 100ms (from cache)
Overlay load< 200ms (async)
Time to interactive< 200ms
Time to first content< 100ms
Improvement40-60x faster

Cache Strategy Parameters

Cache TTL by Page Type

Page TypeDefault TTLInvalidation Trigger
Dashboard5 minutesdata_updated, decision_made
What-if Workbench10 minutesscenario_created, constraint_changed
Scenarios5 minutesscenario_created, decision_made
Insights1 hourforecast_updated

Cache Key Components

page:{pageType}:{tenantId}:{context_params}

Example:
page:dashboard:tenant-123:locationId=loc-456&userId=user-789
page:what-if-workbench:tenant-123:problemId=prob-012&scenarioId=scen-345

Cost Savings

Computation Reduction

By serving cached pages, we reduce:

  • RAG calls: 50-70% fewer queries (shared across users, time window)
  • LLM calls: 30-50% fewer synthesis requests (batch pre-building)
  • Simulations: 40-60% fewer runs (shared results for same scenario)

Estimated Savings

Scenario: 1000 monthly active users, 5 page views each/day

  • Before: 5000 page loads × (RAG + LLM + Monte Carlo) = expensive
  • After: Cache hit ratio 70% → 3500 page loads from cache, 1500 dynamic

LLM cost reduction: ~60-70% RAG call reduction: ~50-70% Total infrastructure cost savings: 30-40%


Rollout Plan

Phase 1: Dashboard (Week 1)

  • Implement PageBuildingService
  • Implement LiveOverlayService
  • Create dashboard caching route
  • Test with 10% of users

Phase 2: Workbench + Scenarios (Week 2)

  • Expand to what-if-workbench
  • Add scenarios page caching
  • Expand to 50% of users

Phase 3: Insights + Monitoring (Week 3)

  • Add insights page caching
  • Implement detailed cache metrics
  • Full rollout to 100% of users

Phase 4: Optimization (Week 4)

  • Fine-tune TTL parameters based on usage
  • Implement predictive pre-building (pre-build before user requests)
  • Monitor cache hit ratios and adjust

Monitoring & Alerting

Key Metrics

- Cache hit ratio (should be > 80%)
- Average page build time (should be < 100ms for cache hits)
- Overlay fetch time (should be < 200ms)
- Cache size (Redis memory usage)
- Invalidation frequency (should be < 2 per second)

Alerts to Set Up

- Cache hit ratio drops below 60%
- Page build time exceeds 500ms
- Redis memory usage > 80%
- Overlay fetch time exceeds 500ms

Edge Cases & Handling

Multi-user Consistency

Problem: User A modifies forecast, User B still sees cached old version

Solution: Invalidate related caches immediately on write

// After forecast update
await PageBuildingService.invalidateRelatedCaches(
'forecast_updated',
tenantId
// No context specified = invalidate ALL forecast-related caches
);

Fresh vs. Recent vs. Stale

Fresh: < 5 minutes old
Recent: 5-60 minutes old
Stale: > 60 minutes old

Display different indicators/colors for each
Offer "refresh" action for stale data

Simulation Results During Computation

Problem: Cached page shows old Monte Carlo results while simulation runs

Solution: Overlay shows simulation status

{
"freshness": {
"monte_carlo": {
"status": "pending",
"message": "New simulation running (45% complete)",
"previous_result_stale": true
}
}
}

Future Enhancements

  1. Predictive Pre-Building

    • Build pages before users request them
    • Based on usage patterns and time of day
  2. Incremental Updates

    • Instead of full cache invalidation, update only changed sections
    • Requires component-level caching
  3. Personalized Caching

    • Cache variants for different user roles
    • Cache with different freshness for different user preferences
  4. Smart TTL Adjustment

    • Automatically adjust TTL based on data change frequency
    • Higher frequency data = shorter TTL
  5. Cache Warming

    • Pre-build pages at off-peak hours
    • Ensures warm cache during peak usage