Latency Optimization Strategy: Pre-Built Pages with Live Overlays

Problem Statement

ChainAlign's rich feature set (RAG, LLM, Monte Carlo simulations, constraint validation) creates latency challenges:

RAG context retrieval: 500-1000ms
LLM synthesis: 3-5 seconds (Gemini API roundtrip)
Monte Carlo simulation: 20-30 seconds (10K iterations)
Constraint validation: 200-500ms
Total typical page load: 4-6 seconds ❌

Users experience slow, sluggish interface despite fast infrastructure.

Solution: Pre-Built Pages with Live Overlays

Architecture Overview

User Request
    ↓
┌─────────────────────────────────────────────────────────┐
│ Step 1: Serve from Cache (< 100ms)                      │
│ - Complete page structure with all content              │
│ - Ready to display immediately                          │
│ - All RAG, LLM, Monte Carlo results pre-baked          │
└─────────────────────────────────────────────────────────┘
    ↓
┌─────────────────────────────────────────────────────────┐
│ Step 2: Fetch Live Overlay Asynchronously (< 200ms)    │
│ - Data freshness indicators                            │
│ - Critical alerts/notifications                        │
│ - Real-time status of background tasks                │
│ - User pending actions                                │
└─────────────────────────────────────────────────────────┘
    ↓
┌─────────────────────────────────────────────────────────┐
│ Step 3: Merge on Frontend                              │
│ - Display cached page immediately                      │
│ - Update with overlay when it arrives                  │
│ - Show freshness badges next to data                  │
│ - Display critical alerts prominently                  │
└─────────────────────────────────────────────────────────┘
    ↓
User sees complete, fresh-looking page in < 200ms total

Key Components

1. PageBuildingService

Purpose: Pre-builds complete pages with all dynamic content

How it works:

Orchestrates RAG, LLM, Monte Carlo, constraint validation in parallel
Stores result in Redis cache with TTL
Returns cached page on next request (instant!)

Cache strategy:

// Cache key: page:dashboard:tenant-123:userId=user-456&locationId=loc-789
// TTL: 5 minutes (configurable)

// On cache miss:
const page = await PageBuildingService.buildPage(
  'dashboard',
  tenantId,
  { userId, locationId },
  300  // 5-minute TTL
);
// Response: {pageType, layout, cache_status: 'MISS', build_time_ms: 4200}

// On cache hit:
// Response: {pageType, layout, cache_status: 'HIT', build_time_ms: 0}

Parallel execution example:

const [
  graphContext,        // RAG: 800ms
  decisionProblems,    // DB query: 200ms
  monteCarloResults,   // Simulation: 25s → But parallel!
  constraints          // Validation: 300ms
] = await Promise.all([
  RAGService.retrieveRelevantChunks(...),
  DecisionProblemsRepository.findAll(...),
  monteCarloService.runSimulation(...),
  constraintValidationService.validate(...)
]);
// Total time: max(25s) instead of 25+0.8+0.2+0.3 = 26.3s

2. LiveOverlayService

Purpose: Adds live data to cached pages without rebuilding

Data provided:

Freshness indicators - When each data source was last updated
Critical alerts - High-priority issues user should know about
Real-time status - Background task progress, queue status, etc.

Example freshness response:

{
  "freshness": {
    "scenarios": {
      "source": "scenarios",
      "last_updated": "2025-10-22T14:23:45Z",
      "age_seconds": 240,
      "age_display": "4 minutes ago",
      "status": "recent"
    },
    "forecasts": {
      "source": "forecasts",
      "last_updated": "2025-10-21T08:00:00Z",
      "age_seconds": 86400,
      "age_display": "1 day ago",
      "status": "stale"
    },
    "monte_carlo": {
      "source": "simulation:scenario-123",
      "status": "pending",
      "message": "Running (45% complete)"
    }
  },
  "alerts": [
    {
      "severity": "warning",
      "type": "stale_forecast",
      "message": "Demand forecast is >24 hours old",
      "action_url": "/forecasts?action=refresh"
    },
    {
      "severity": "action_required",
      "type": "pending_approval",
      "message": "2 decisions awaiting your approval",
      "action_url": "/decisions?filter=pending"
    }
  ],
  "status": {
    "services": {
      "rag": "healthy",
      "llm": "healthy",
      "simulation_queue": {
        "pending_count": 3,
        "running_count": 1,
        "total_capacity": 100
      }
    }
  }
}

3. Cache Invalidation

When to invalidate:

User makes a decision (decision_made event)
New scenario created (scenario_created)
Forecast refreshed (data_updated)
Constraints changed (constraint_changed)

Implementation:

// After decision is recorded
await PageBuildingService.invalidateRelatedCaches(
  'decision_made',
  tenantId,
  { problemId, decisionId }
);
// Automatically invalidates: dashboard, what-if-workbench, scenarios

API Endpoints

Get Cached Page

GET /api/pages/:pageType?context={...}&cache_ttl=300&force_rebuild=false

Response: {
  status: 'success',
  data: { pageType, metadata, layout, ... },
  cache_status: 'HIT' | 'MISS',
  build_time_ms: 0 | 4200
}

Get Live Overlay

GET /api/pages/:pageType/overlay?context={...}

Response: {
  status: 'success',
  data: {
    freshness: { ... },
    alerts: [ ... ],
    status: { ... }
  }
}

Invalidate Cache

POST /api/pages/:pageType/invalidate
Body: { context: { ... } }

POST /api/pages/invalidate-related
Body: {
  event: 'decision_made' | 'scenario_created' | 'data_updated',
  context: { ... }
}

Frontend Implementation Example

React Component Pattern

// DashboardPage.tsx
import { useEffect, useState } from 'react';

export function DashboardPage() {
  const [page, setPage] = useState(null);
  const [overlay, setOverlay] = useState(null);
  const [loading, setLoading] = useState(true);

  useEffect(() => {
    // Step 1: Load cached page immediately
    fetchPage();

    // Step 2: Load overlay asynchronously
    fetchOverlay();
  }, []);

  async function fetchPage() {
    try {
      const response = await fetch(
        `/api/pages/dashboard?context=${JSON.stringify({
          userId: currentUser.id,
          locationId: selectedLocation
        })}`
      );
      const { data } = await response.json();
      setPage(data);
      setLoading(false);
    } catch (error) {
      console.error('Failed to load page:', error);
      setLoading(false);
    }
  }

  async function fetchOverlay() {
    try {
      const response = await fetch(
        `/api/pages/dashboard/overlay?context=${JSON.stringify({
          userId: currentUser.id,
          locationId: selectedLocation
        })}`
      );
      const { data } = await response.json();
      setOverlay(data);
    } catch (error) {
      console.error('Failed to load overlay:', error);
    }
  }

  if (loading && !page) {
    return <LoadingSpinner />;
  }

  return (
    <div className="dashboard">
      {/* Display cached page immediately */}
      <PageRenderer page={page} />

      {/* Overlay data updates asynchronously */}
      {overlay && (
        <div className="overlay-layer">
          <FreshnessBadges freshness={overlay.freshness} />
          <AlertsBanner alerts={overlay.alerts} />
          <StatusIndicators status={overlay.status} />
        </div>
      )}
    </div>
  );
}

// Freshness display component
function FreshnessBadges({ freshness }) {
  return (
    <div className="freshness-badges">
      {Object.entries(freshness).map(([key, data]) => (
        <div key={key} className={`badge badge-${data.status}`}>
          <span className="source">{key}</span>
          <span className="age">{data.age_display}</span>
          {data.status === 'stale' && (
            <a href="#" className="refresh-link">Refresh</a>
          )}
        </div>
      ))}
    </div>
  );
}

// Alerts banner
function AlertsBanner({ alerts }) {
  return (
    <>
      {alerts.map((alert) => (
        <div key={alert.id} className={`alert alert-${alert.severity}`}>
          <span>{alert.message}</span>
          <a href={alert.action_url}>Take action</a>
        </div>
      ))}
    </>
  );
}

Performance Metrics

Before (Dynamic Building)

Metric	Time
Page load	4-6 seconds
Time to interactive	5-7 seconds
Time to first content	500ms (blank screen)

After (Cache + Overlay)

Metric	Time
Page load	< 100ms (from cache)
Overlay load	< 200ms (async)
Time to interactive	< 200ms
Time to first content	< 100ms
Improvement	40-60x faster

Cache Strategy Parameters

Cache TTL by Page Type

Page Type	Default TTL	Invalidation Trigger
Dashboard	5 minutes	data_updated, decision_made
What-if Workbench	10 minutes	scenario_created, constraint_changed
Scenarios	5 minutes	scenario_created, decision_made
Insights	1 hour	forecast_updated

Cache Key Components

page:{pageType}:{tenantId}:{context_params}

Example:
page:dashboard:tenant-123:locationId=loc-456&userId=user-789
page:what-if-workbench:tenant-123:problemId=prob-012&scenarioId=scen-345

Cost Savings

Computation Reduction

By serving cached pages, we reduce:

RAG calls: 50-70% fewer queries (shared across users, time window)
LLM calls: 30-50% fewer synthesis requests (batch pre-building)
Simulations: 40-60% fewer runs (shared results for same scenario)

Estimated Savings

Scenario: 1000 monthly active users, 5 page views each/day

Before: 5000 page loads × (RAG + LLM + Monte Carlo) = expensive
After: Cache hit ratio 70% → 3500 page loads from cache, 1500 dynamic

LLM cost reduction: ~60-70% RAG call reduction: ~50-70% Total infrastructure cost savings: 30-40%

Rollout Plan

Phase 1: Dashboard (Week 1)

Implement PageBuildingService
Implement LiveOverlayService
Create dashboard caching route
Test with 10% of users

Phase 2: Workbench + Scenarios (Week 2)

Expand to what-if-workbench
Add scenarios page caching
Expand to 50% of users

Phase 3: Insights + Monitoring (Week 3)

Add insights page caching
Implement detailed cache metrics
Full rollout to 100% of users

Phase 4: Optimization (Week 4)

Fine-tune TTL parameters based on usage
Implement predictive pre-building (pre-build before user requests)
Monitor cache hit ratios and adjust

Monitoring & Alerting

Key Metrics

- Cache hit ratio (should be > 80%)
- Average page build time (should be < 100ms for cache hits)
- Overlay fetch time (should be < 200ms)
- Cache size (Redis memory usage)
- Invalidation frequency (should be < 2 per second)

Alerts to Set Up

- Cache hit ratio drops below 60%
- Page build time exceeds 500ms
- Redis memory usage > 80%
- Overlay fetch time exceeds 500ms

Edge Cases & Handling

Multi-user Consistency

Problem: User A modifies forecast, User B still sees cached old version

Solution: Invalidate related caches immediately on write

// After forecast update
await PageBuildingService.invalidateRelatedCaches(
  'forecast_updated',
  tenantId
  // No context specified = invalidate ALL forecast-related caches
);

Fresh vs. Recent vs. Stale

Fresh: < 5 minutes old
Recent: 5-60 minutes old
Stale: > 60 minutes old

Display different indicators/colors for each
Offer "refresh" action for stale data

Simulation Results During Computation

Problem: Cached page shows old Monte Carlo results while simulation runs

Solution: Overlay shows simulation status

{
  "freshness": {
    "monte_carlo": {
      "status": "pending",
      "message": "New simulation running (45% complete)",
      "previous_result_stale": true
    }
  }
}

Future Enhancements

Predictive Pre-Building
- Build pages before users request them
- Based on usage patterns and time of day
Incremental Updates
- Instead of full cache invalidation, update only changed sections
- Requires component-level caching
Personalized Caching
- Cache variants for different user roles
- Cache with different freshness for different user preferences
Smart TTL Adjustment
- Automatically adjust TTL based on data change frequency
- Higher frequency data = shorter TTL
Cache Warming
- Pre-build pages at off-peak hours
- Ensures warm cache during peak usage

Problem Statement​

Solution: Pre-Built Pages with Live Overlays​

Architecture Overview​

Key Components​

1. PageBuildingService​

2. LiveOverlayService​

3. Cache Invalidation​

API Endpoints​

Get Cached Page​

Get Live Overlay​

Invalidate Cache​

Invalidate Related Caches​

Frontend Implementation Example​

React Component Pattern​

Performance Metrics​

Before (Dynamic Building)​

After (Cache + Overlay)​

Cache Strategy Parameters​

Cache TTL by Page Type​

Cache Key Components​

Cost Savings​

Computation Reduction​

Estimated Savings​

Rollout Plan​

Phase 1: Dashboard (Week 1)​

Phase 2: Workbench + Scenarios (Week 2)​

Phase 3: Insights + Monitoring (Week 3)​

Phase 4: Optimization (Week 4)​

Monitoring & Alerting​

Key Metrics​

Alerts to Set Up​

Edge Cases & Handling​

Multi-user Consistency​

Fresh vs. Recent vs. Stale​

Simulation Results During Computation​

Future Enhancements​