Skip to main content

MILESTONE 3: EXTERNAL DATA INTEGRATION & CORRELATION ANALYSIS

Status: Planning Phase Target: 100% Complete Priority: HIGH (enables AI-driven insights)


Executive Summary

MILESTONE 3 integrates external data sources with supply chain data to discover correlations that drive demand and supply patterns. Rather than assuming correlation exists (which most companies can't share), we'll build a hindcasting engine that:

  1. Discovers correlations empirically through historical analysis
  2. Surfaces policy events that trigger supply chain routing changes
  3. Enables what-if scenarios based on learned relationships
  4. Provides transparent reasoning on why external factors matter

This moves beyond simple data fetching to correlation discovery, enabling the AI to explain demand movements through weather, policy, and economic shifts.


MILESTONE 3.1: Weather Hindcasting & Correlation Discovery

Strategic Approach

Problem: Most companies don't have historical weather-to-sales correlations documented. We can't assume causation without data.

Solution: Build a hindcasting engine that:

  1. Collects historical weather data for past N years
  2. Correlates with actual sales/demand history
  3. Surfaces statistically significant relationships
  4. Builds location-specific correlation profiles
  5. Enables what-if scenarios with learned correlations

Architecture

Weather API (OpenWeatherMap, WeatherAPI, etc.)

Hindcasting Engine
├─→ Pull Historical Weather (5-10 years back)
├─→ Normalize Weather Data
├─→ Correlate with Demand History
└─→ Store Correlation Profiles

Output: "20% correlation between temperature and ice cream sales"

What-If: "If temp drops 5°C next week, expect -12% demand"

Free Weather APIs Available

Option 1: OpenWeatherMap Historical (free tier: 5-day history)

  • 5-day historical lookback
  • Global coverage
  • Limited free tier

Option 2: WeatherAPI.com (free tier: 7-day history)

  • Better free tier coverage
  • More detailed metrics
  • Location-based

Option 3: NOAA/NCEI (free, unlimited)

  • US-centric but comprehensive
  • Historical data since 1900s
  • Most reliable for hindcasting
  • Can integrate via NOAA API

Option 4: Open-Meteo (free, no key required)

  • Historical weather back to 1940
  • Global coverage
  • No API key needed
  • Perfect for hindcasting

Recommendation: Start with Open-Meteo (unlimited, global) + NOAA (backup, US-detailed)

Implementation Plan

Phase 1: Data Collection (M3.1.1)

Files to Create:

  1. WeatherHistoricalDataService.js - Fetch/normalize historical weather

    • Method: getHistoricalWeather(lat, lon, startDate, endDate) → returns normalized daily data
    • Caches results to avoid re-fetching
    • Handles missing data periods
    • Returns: {date, temp_avg, temp_min, temp_max, precipitation, humidity, wind_speed}
  2. WeatherDataRepository.js - Store weather snapshots

    • Table: weather_history (tenant_id, location, date, metrics)
    • Tracks source, last_updated
    • Indexes on (tenant_id, location, date) for query performance
  3. Database Migration

    • Creates weather_history table
    • Creates correlation_profiles table (for results)
    • Indexes for time-series queries

Phase 2: Correlation Analysis (M3.1.2)

Files to Create:

  1. CorrelationDiscoveryService.js - Statistical analysis

    • Method: discoverCorrelations(demandHistory, weatherHistory, locationId) → finds correlations
    • Calculates Pearson correlation coefficient
    • Identifies lag periods (weather 1-7 days before demand shift)
    • Filters for statistical significance (p-value < 0.05)
    • Returns: {metric, correlation_coefficient, lag_days, p_value, sample_size}
  2. HindcastingEngine.js - Generates forecasts from correlations

    • Method: generateForecast(currentWeather, correlationProfile, demandBaseline) → what-if forecast
    • Applies learned correlations to current/future weather
    • Shows confidence intervals
    • Explains reasoning ("Based on 5-year history, temperature shifts correlate with demand")

Phase 3: API Integration (M3.1.3)

Files to Create:

  1. weatherForecastRoutes.js - Hindcast API endpoints
    • GET /api/weather/historical/:location - Get cached historical data
    • GET /api/weather/correlations/:location - Get discovered correlations
    • POST /api/weather/forecast - Generate what-if forecast
    • POST /api/weather/sync - Force sync historical data

Data Model

-- Weather history for hindcasting
CREATE TABLE weather_history (
id UUID PRIMARY KEY,
tenant_id UUID,
location_id UUID, -- or lat/lon pair
date DATE,
temp_avg DECIMAL(5,2), -- Celsius
temp_min DECIMAL(5,2),
temp_max DECIMAL(5,2),
precipitation DECIMAL(6,2), -- mm
humidity INT, -- 0-100
wind_speed DECIMAL(5,2), -- m/s
source VARCHAR(100), -- "open-meteo", "noaa", etc.
data_quality_score DECIMAL(3,2), -- 0-1
created_at TIMESTAMP,
INDEX (tenant_id, location_id, date)
);

-- Learned correlations per location
CREATE TABLE correlation_profiles (
id UUID PRIMARY KEY,
tenant_id UUID,
location_id UUID,
demand_metric VARCHAR(100), -- "sales", "units_sold", etc.
weather_metric VARCHAR(100), -- "temp_avg", "precipitation", etc.
correlation_coefficient DECIMAL(5,3), -- -1 to +1
lag_days INT, -- days between weather and demand
p_value DECIMAL(10,8), -- statistical significance
sample_size INT, -- number of data points used
r_squared DECIMAL(5,3), -- explained variance
equation VARCHAR(500), -- linear regression: y = ax + b
confidence_interval VARCHAR(100), -- 95% CI: [lower, upper]
time_period_start DATE, -- when data was collected
time_period_end DATE,
last_updated TIMESTAMP,
INDEX (tenant_id, location_id, demand_metric)
);

Use Cases

Use Case 1: Demand Pattern Explanation

User sees: Demand spiked 30% last week
System explains:
"Temperature dropped 8°C (0.87 correlation, p<0.01)
Based on 5-year history: -1.2% demand per °C drop
Expected impact: -9.6%, actual: -10.1%
✓ Aligns with learned correlation"

Use Case 2: What-If Forecasting

Weather forecast: Temperature will rise 3°C next week
System predicts:
"Based on learned correlation (0.72, lag=2 days),
expect +3.6% demand increase in 2-3 days
Confidence: 78% (from 5-year profile)"

Use Case 3: Seasonal Pattern Training

System analyzes:
- January: -22% correlation with temp (winter driving demand)
- July: +18% correlation with temp (air conditioning demand)
- Creates seasonal sub-profiles

MILESTONE 3.2: Economic Indicators & Policy Integration

Strategic Approach

Problem: Supply chains react to policy shifts (FTA, tariffs, sanctions), but these aren't reflected in demand data alone.

Solution: Track policy events and economic indicators, surface their timing relative to supply chain changes.

Free Data Sources

IMF (International Monetary Fund)

World Bank

UN COMTRADE (trade data)

U.S. Federal Reserve (FRED)

USDA WASDE (agricultural)

  • World Agricultural Outlook
  • Commodity price forecasts
  • Free, public data

Recommendation Order:

  1. IMF (macro indicators, global)
  2. World Bank (trade flows, development)
  3. UN COMTRADE (bilateral trade detail)
  4. FRED (US-specific economic data)
  5. USDA WASDE (agricultural commodities)

Implementation Plan

Phase 1: Economic Data Ingestion (M3.2.1)

Files to Create:

  1. EconomicDataService.js - Fetch economic indicators

    • Methods:
      • getIMFIndicators(countries, indicators, startYear, endYear) → GDP, inflation, exchange rates
      • getWorldBankData(indicators, countries) → development indicators
      • getFREDData(series_id, startDate, endDate) → US economic data
    • Caching to avoid re-fetching
    • Data normalization
  2. TradeDataService.js - Fetch trade flows

    • Methods:
      • getBilateralTrade(exporter, importer, startYear, endYear) → bilateral trade volumes
      • getProductTrade(hs_code, countries, years) → product-level trade
    • UN COMTRADE integration
    • Trade volume growth rates
  3. Database Migrations

    • Creates economic_indicators table
    • Creates trade_flows table
    • Creates policy_events table

Phase 2: Policy Event Tracking (M3.2.2)

Files to Create:

  1. PolicyEventService.js - Track policy changes

    • Manual event entry: FTA signed, tariffs imposed, sanctions, etc.
    • Standardized event types: TARIFF_CHANGE, FTA_SIGNED, SANCTIONS, QUOTA_CHANGE, EXCHANGE_RATE_SHOCK
    • Link to affected suppliers/regions
    • Document impact (tariff rate, effective date, scope)
  2. SupplyChainRoutingAnalyzer.js - Connect policy to routing changes

    • Detects when suppliers/routes change after policy event
    • Calculates lead time to change (e.g., "tariff → 45 days → route change")
    • Builds routing decision trees
  3. Database Migration

    • Creates policy_events table
    • Creates supplier_routing_history table (tracks supplier destination changes)

Phase 3: Impact Analysis (M3.2.3)

Files to Create:

  1. PolicyImpactService.js - Assess policy effects

    • Methods:
      • analyzePolicyImpact(policyEvent, historicalData) → impact magnitude and timing
      • forecaseRoutingChange(policy, supplierProfile) → predict routing shift
      • estimateCostImpact(tariff, tradeVolume, alternatives) → financial impact
  2. Integration with Forecast Engine

    • Feeds policy-adjusted forecasts to HybridForecastingService
    • Shows policy reasoning in forecast explanations

Data Models

-- Economic indicators (IMF, World Bank, FRED)
CREATE TABLE economic_indicators (
id UUID PRIMARY KEY,
tenant_id UUID,
country_code VARCHAR(3),
indicator_code VARCHAR(50), -- "GDP_GROWTH", "INFLATION_RATE", etc.
indicator_name VARCHAR(255),
value DECIMAL(15,4),
unit VARCHAR(50), -- "percent", "USD", etc.
date DATE, -- monthly or annual
source VARCHAR(100), -- "IMF", "World Bank", "FRED", etc.
created_at TIMESTAMP,
INDEX (country_code, indicator_code, date)
);

-- Bilateral trade flows
CREATE TABLE trade_flows (
id UUID PRIMARY KEY,
tenant_id UUID,
exporter_country VARCHAR(3),
importer_country VARCHAR(3),
hs_code VARCHAR(10), -- product code (optional)
trade_value DECIMAL(15,2), -- USD
trade_volume DECIMAL(15,2), -- metric tons (if applicable)
year INT,
source VARCHAR(100), -- "UN COMTRADE", etc.
created_at TIMESTAMP,
INDEX (exporter_country, importer_country, year)
);

-- Policy events and changes
CREATE TABLE policy_events (
id UUID PRIMARY KEY,
tenant_id UUID,
event_type VARCHAR(50), -- "TARIFF_CHANGE", "FTA_SIGNED", "SANCTIONS", etc.
event_date DATE, -- when policy took effect
announced_date DATE, -- when announced
affected_countries VARCHAR(500), -- JSON array
affected_hs_codes VARCHAR(500), -- JSON array (product codes)
description TEXT,
impact_magnitude DECIMAL(5,2), -- e.g., tariff rate: 15%
effective_date DATE,
expiry_date DATE, -- if temporary
source VARCHAR(255), -- URL to official announcement
user_notes TEXT, -- user's analysis
created_by UUID,
created_at TIMESTAMP,
INDEX (event_date, event_type)
);

-- Supplier routing history
CREATE TABLE supplier_routing_history (
id UUID PRIMARY KEY,
tenant_id UUID,
supplier_id UUID,
destination_country VARCHAR(3),
start_date DATE,
end_date DATE,
volume DECIMAL(15,2),
reason VARCHAR(255), -- "TARIFF_AVOIDANCE", "TRADE_AGREEMENT", etc.
policy_event_id UUID REFERENCES policy_events,
created_at TIMESTAMP,
INDEX (supplier_id, destination_country, start_date)
);

Use Cases

Use Case 1: Policy Explains Routing Change

User sees: Supplier ABC switched from Route A → Route C in March
System explains:
"US-China tariff increase (25%) on HS 8471 announced Jan 15
Lead time to reroute: 60 days
Tariff avoidance would save: $2.3M annually
✓ Timing and economics align with routing change"

Use Case 2: Forecast Incorporates Policy Risk

Forecast shows: 12% demand increase Q3
Policy event: India-US FTA negotiations (uncertain outcome)
System flags:
"If FTA signed: +15% demand (improved competitiveness)
If negotiations fail: +9% demand (tariff risk premium)
Current probability: 60% signing, 40% failure"

Use Case 3: Supply Chain Resilience Analysis

User asks: What if US reimpose China tariffs?
System simulates:
- Current sourcing: 45% China exposure
- Alternative routes: Vietnam (12% extra cost), Mexico (8% extra cost)
- Network optimization: Rebalance to 25% China, 40% Vietnam, 35% Mexico
- Financial impact: +$1.8M annually

MILESTONE 3.3: Supply Chain Policy Events Tracking

Real-World Pharma Example

Your pharma client with APIs (Active Pharmaceutical Ingredients) moving across 36 destinations for tax/profit optimization exemplifies the complexity:

Scenario: API manufacturing in India, China, synthesis in Switzerland, formulation in Puerto Rico, EU distribution

Policy Drivers:

  • Switzerland GMP requirements → must source from approved suppliers
  • China restrictions on chemical precursors → alternate sources
  • US tariff on Chinese APIs → reroute through Vietnam or India
  • EU duty drawback regulations → optimizes intra-EU moves
  • Puerto Rico Act 60 (formerly Act 20) → tax incentives for manufacturing
  • India FTA status with EU/UK → tariff preferential treatment
  • RCEP (Regional Comprehensive Economic Partnership) → Asian supply chain benefits

System Tracks:

  1. Policy events triggering routing changes
  2. Lead time from announcement to implementation (regulatory, customs)
  3. Financial impact per route scenario
  4. Compliance requirements per destination

Implementation

M3.3.1: Policy Event Data Model

# Example policy event structure
policy_event = {
"id": "EVT-2024-TARIFF-CHINA-API",
"type": "TARIFF_CHANGE",
"date_announced": "2024-01-15",
"date_effective": "2024-03-01",
"description": "25% additional tariff on chemical imports (HS 2905, 2909, 2924)",
"affected_hs_codes": ["290511", "290919", "292430"], # API codes
"affected_exporters": ["CN"], # China
"affected_importers": ["US"],
"tariff_rate_change": {
"before": "5%",
"after": "30%"
},
"estimated_trade_impact": "$2.3B annually",
"supply_chain_implications": {
"impacted_companies": "Pharma, biotech, specialty chemicals",
"alternative_sources": [
{"country": "IN", "tariff": "5%", "lead_time_weeks": 4, "cost_premium": "8%"},
{"country": "VN", "tariff": "12%", "lead_time_weeks": 6, "cost_premium": "15%"},
{"country": "JP", "tariff": "3%", "lead_time_weeks": 5, "cost_premium": "20%"}
]
}
}

M3.3.2: Routing Optimization Engine

// Routing decision tree for Pharma API
{
"sku": "API-ACETAMINOPHEN",
"baseline_route": "China → Hong Kong → New York",
"routing_rules": [
{
"condition": "US_TARIFF >= 20%",
"action": "route_through_india",
"new_route": "India → Singapore → New York",
"cost_delta": "+8%",
"lead_time_delta": "+7_days"
},
{
"condition": "US_TARIFF >= 25% AND VOLUME > 1000_tons",
"action": "local_contract_manufacturing",
"new_route": "Puerto Rico facility",
"cost_delta": "+12%",
"lead_time_delta": "-14_days",
"tax_benefit": "Act 60 exemption"
},
{
"condition": "EU_DESTINATION AND EU_TARIFF < 5%",
"action": "route_through_switzerland",
"new_route": "India → Switzerland → EU",
"cost_delta": "+15%",
"tariff_benefit": "-8%"
}
]
}

M3.3.3: Compliance & Lead Time Tracking

CREATE TABLE supply_chain_policy_compliance (
id UUID PRIMARY KEY,
tenant_id UUID,
sku_id UUID,
from_country VARCHAR(3),
to_country VARCHAR(3),
compliance_requirement VARCHAR(255), -- "GMP", "CITES", "FDA_APPROVAL", etc.
lead_time_days INT, -- days for compliance (customs, inspections)
documentation_required TEXT, -- certificates, permits
regulatory_body VARCHAR(100), -- "FDA", "EMA", "PMDA", etc.
last_successful_transit TIMESTAMP,
INDEX (sku_id, from_country, to_country)
);

MILESTONE 3.4: Correlation Discovery Engine

Advanced Analytics

Once we have historical weather, policy events, and economic data, we can train models to discover unexpected correlations.

Example Correlations to Discover:

  1. Weather + Demand: Temperature shifts predict ice cream, heating oil, agricultural demand
  2. Policy + Inventory: Tariff announcements correlate with supplier inventory changes (6-8 weeks prior)
  3. Exchange Rates + Sourcing: Currency devaluation → increased export sourcing from that country
  4. Competitor Actions + Market: Competitor tariff exposure → your market share increases
  5. Regulatory Changes + Pricing: New safety standards → increased component costs

Implementation

M3.4.1: Time-Series Analysis Service

// Correlation discovery pipeline
{
"input_series": {
"demand": "daily_sales_units",
"weather": "daily_avg_temp",
"policy": "tariff_rate_changes",
"economics": "exchange_rate"
},
"analysis_steps": [
{
"step": "lag_analysis",
"description": "Find optimal lag between cause and effect",
"output": "best_lag_days = 14"
},
{
"step": "correlation_coefficient",
"description": "Pearson correlation with statistical significance",
"output": "r = 0.68, p < 0.01"
},
{
"step": "causality_test",
"description": "Granger causality to test if causation is plausible",
"output": "weather Granger-causes demand at lag 14"
},
{
"step": "interaction_effects",
"description": "Test for combined effects (weather × policy)",
"output": "weather impact is 40% stronger during tariff periods"
},
{
"step": "seasonality_decomposition",
"description": "Separate trend, seasonal, residual components",
"output": "correlation stronger in summer, weaker in winter"
}
]
}

M3.4.2: Explainable AI for Correlations

// Example explanation of discovered correlation
{
"correlation": "Temperature → Ice Cream Demand",
"strength": 0.82,
"confidence": "95% (p < 0.01)",
"lag": "0 days (same-day effect)",
"seasonality": "Much stronger June-August",
"data_points_analyzed": 1825, // 5 years
"explanation": {
"simple": "When temperature rises, ice cream sales increase proportionally",
"detailed": "For every 1°C increase, expect 3.2% sales increase. Effect is non-linear at extremes (>32°C).",
"caveats": [
"Correlation breaks during pandemic lockdowns (2020-2021)",
"Holiday sales override weather patterns (Dec-Jan)",
"Weekend effect: +40% baseline, weather multiplier still applies"
]
},
"forecast_application": {
"weather_next_7_days": [
{"date": "2024-01-22", "temp_forecast": 18, "demand_adjustment": "+5.8%"},
{"date": "2024-01-23", "temp_forecast": 22, "demand_adjustment": "+13.4%"}
]
}
}

MILESTONE 3 Implementation Roadmap

Phase 1: Weather Hindcasting (Weeks 1-3)

  • M3.1.1: Weather data collection (Open-Meteo + NOAA)
  • M3.1.2: Correlation discovery (Pearson + lag analysis)
  • M3.1.3: Weather forecast API routes
  • Deliverable: Users can see weather correlations with demand

Phase 2: Economic Data (Weeks 4-5)

  • M3.2.1: Economic indicator ingestion (IMF, World Bank)
  • M3.2.2: Policy event tracking (manual + API sources)
  • M3.2.3: Impact analysis service
  • Deliverable: Supply chain policy effects documented and traced

Phase 3: Advanced Analytics (Weeks 6-7)

  • M3.4.1: Time-series correlation discovery
  • M3.4.2: Explainable AI for correlations
  • Multi-variable regression (weather × policy × economics)
  • Deliverable: System explains demand movements through external factors

Phase 4: Integration (Week 8)

  • Integrate correlations into forecast engine
  • Dashboard showing correlation insights
  • What-if scenario builder
  • Deliverable: Complete external data → forecast pipeline

Technical Decisions

Data Pipeline Strategy

Batch vs Stream:

  • Weather: Daily batch (cost-efficient, sufficient for forecasting)
  • Economic indicators: Weekly batch (IMF/World Bank publish on schedules)
  • Policy events: Manual entry + RSS feed monitoring (ad-hoc nature)
  • Trade data: Monthly batch (COMTRADE publishes monthly)

Caching Strategy:

  • Weather historical: Cache heavily (never changes)
  • Economic indicators: Cache 24 hours (updates infrequently)
  • Policy events: No cache (manual entries, real-time importance)
  • Trade data: Cache 7 days (monthly updates)

Correlation Thresholds

Statistically Significant:

  • p-value < 0.05 (95% confidence)
  • Sample size > 50 (at least 50 data points)
  • r² > 0.20 (explains at least 20% of variance)

Actionable in Forecast:

  • Correlation coefficient > 0.30 (weak but notable)
  • p-value < 0.01 (very significant)
  • Consistent across 80%+ of time periods

Storage Strategy

Time-Series Data:

  • Use PostgreSQL with proper indexing (not timeseries DB initially)
  • Index on (tenant_id, location_id, date) for query performance
  • Partition by year if data grows large (>10M rows)

Correlation Results:

  • Store computed correlations (don't recalculate daily)
  • Include metadata: sample size, time period, data quality
  • Version correlations (e.g., "Jan-May v1" vs "Jan-May v2 with outlier removal")

Risk Mitigation

Data Quality Risks

Weather Data Gaps:

  • Mitigation: Multi-source (Open-Meteo + NOAA) with fallback
  • Interpolate missing days using adjacent data
  • Flag results based on data quality score

Spurious Correlations:

  • Mitigation: Require statistical significance (p < 0.05)
  • Separate time periods to avoid trend artifacts
  • Use Granger causality to test directionality
  • Manual review of surprising correlations

Policy Event Incompleteness:

  • Mitigation: User can manually add events
  • Monitor trade news APIs/RSS feeds
  • Cross-reference with government announcements

Performance Risks

Large Correlation Calculations:

  • Mitigation: Pre-compute correlations overnight
  • Cache results heavily
  • Show "correlation calculated at X time" in UI

API Rate Limits:

  • Mitigation: Batch requests during off-peak hours
  • Cache aggressively
  • Use free tier APIs only (no cost surprises)

Success Metrics

M3.1: Weather Hindcasting ✅ Discover 5+ statistically significant weather-demand correlations ✅ What-if forecast shows 80%+ accuracy vs actual outcomes ✅ Users report insights they didn't know before

M3.2: Policy Integration ✅ Link 100% of supply chain routing changes to policy events within 60-day window ✅ Policy impact estimates within ±15% of actual financial impact ✅ Early warning: flag policy announcements 60+ days before impact

M3.3: Supply Chain Policy Tracking ✅ Comprehensive policy event database (tariffs, FTAs, sanctions) ✅ Routing decision models reflect company-specific rules ✅ Compliance tracking for each supplier-destination pair

M3.4: Correlation Discovery ✅ Discover 20+ correlations across weather, policy, economics, demand ✅ Multi-variable models explain 60%+ of demand variance ✅ Explainable AI shows clear reasoning for forecast adjustments


Conclusion

MILESTONE 3 transforms ChainAlign from a data-driven platform into an insight-driven decision engine by:

  1. Discovering empirical correlations (don't assume, prove)
  2. Surfacing policy impacts (regulatory changes → routing changes)
  3. Enabling scenario modeling (what-if with learned relationships)
  4. Providing transparent explanations (why external factors matter)

This positions ChainAlign as the correlation discovery engine that companies can't build themselves (because data is proprietary), while remaining flexible enough to work with each company's unique patterns.