MILESTONE 3: EXTERNAL DATA INTEGRATION & CORRELATION ANALYSIS
Status: Planning Phase Target: 100% Complete Priority: HIGH (enables AI-driven insights)
Executive Summary
MILESTONE 3 integrates external data sources with supply chain data to discover correlations that drive demand and supply patterns. Rather than assuming correlation exists (which most companies can't share), we'll build a hindcasting engine that:
- Discovers correlations empirically through historical analysis
- Surfaces policy events that trigger supply chain routing changes
- Enables what-if scenarios based on learned relationships
- Provides transparent reasoning on why external factors matter
This moves beyond simple data fetching to correlation discovery, enabling the AI to explain demand movements through weather, policy, and economic shifts.
MILESTONE 3.1: Weather Hindcasting & Correlation Discovery
Strategic Approach
Problem: Most companies don't have historical weather-to-sales correlations documented. We can't assume causation without data.
Solution: Build a hindcasting engine that:
- Collects historical weather data for past N years
- Correlates with actual sales/demand history
- Surfaces statistically significant relationships
- Builds location-specific correlation profiles
- Enables what-if scenarios with learned correlations
Architecture
Weather API (OpenWeatherMap, WeatherAPI, etc.)
↓
Hindcasting Engine
├─→ Pull Historical Weather (5-10 years back)
├─→ Normalize Weather Data
├─→ Correlate with Demand History
└─→ Store Correlation Profiles
↓
Output: "20% correlation between temperature and ice cream sales"
↓
What-If: "If temp drops 5°C next week, expect -12% demand"
Free Weather APIs Available
Option 1: OpenWeatherMap Historical (free tier: 5-day history)
- 5-day historical lookback
- Global coverage
- Limited free tier
Option 2: WeatherAPI.com (free tier: 7-day history)
- Better free tier coverage
- More detailed metrics
- Location-based
Option 3: NOAA/NCEI (free, unlimited)
- US-centric but comprehensive
- Historical data since 1900s
- Most reliable for hindcasting
- Can integrate via NOAA API
Option 4: Open-Meteo (free, no key required)
- Historical weather back to 1940
- Global coverage
- No API key needed
- Perfect for hindcasting
Recommendation: Start with Open-Meteo (unlimited, global) + NOAA (backup, US-detailed)
Implementation Plan
Phase 1: Data Collection (M3.1.1)
Files to Create:
-
WeatherHistoricalDataService.js- Fetch/normalize historical weather- Method:
getHistoricalWeather(lat, lon, startDate, endDate)→ returns normalized daily data - Caches results to avoid re-fetching
- Handles missing data periods
- Returns: {date, temp_avg, temp_min, temp_max, precipitation, humidity, wind_speed}
- Method:
-
WeatherDataRepository.js- Store weather snapshots- Table: weather_history (tenant_id, location, date, metrics)
- Tracks source, last_updated
- Indexes on (tenant_id, location, date) for query performance
-
Database Migration
- Creates weather_history table
- Creates correlation_profiles table (for results)
- Indexes for time-series queries
Phase 2: Correlation Analysis (M3.1.2)
Files to Create:
-
CorrelationDiscoveryService.js- Statistical analysis- Method:
discoverCorrelations(demandHistory, weatherHistory, locationId)→ finds correlations - Calculates Pearson correlation coefficient
- Identifies lag periods (weather 1-7 days before demand shift)
- Filters for statistical significance (p-value < 0.05)
- Returns: {metric, correlation_coefficient, lag_days, p_value, sample_size}
- Method:
-
HindcastingEngine.js- Generates forecasts from correlations- Method:
generateForecast(currentWeather, correlationProfile, demandBaseline)→ what-if forecast - Applies learned correlations to current/future weather
- Shows confidence intervals
- Explains reasoning ("Based on 5-year history, temperature shifts correlate with demand")
- Method:
Phase 3: API Integration (M3.1.3)
Files to Create:
weatherForecastRoutes.js- Hindcast API endpointsGET /api/weather/historical/:location- Get cached historical dataGET /api/weather/correlations/:location- Get discovered correlationsPOST /api/weather/forecast- Generate what-if forecastPOST /api/weather/sync- Force sync historical data
Data Model
-- Weather history for hindcasting
CREATE TABLE weather_history (
id UUID PRIMARY KEY,
tenant_id UUID,
location_id UUID, -- or lat/lon pair
date DATE,
temp_avg DECIMAL(5,2), -- Celsius
temp_min DECIMAL(5,2),
temp_max DECIMAL(5,2),
precipitation DECIMAL(6,2), -- mm
humidity INT, -- 0-100
wind_speed DECIMAL(5,2), -- m/s
source VARCHAR(100), -- "open-meteo", "noaa", etc.
data_quality_score DECIMAL(3,2), -- 0-1
created_at TIMESTAMP,
INDEX (tenant_id, location_id, date)
);
-- Learned correlations per location
CREATE TABLE correlation_profiles (
id UUID PRIMARY KEY,
tenant_id UUID,
location_id UUID,
demand_metric VARCHAR(100), -- "sales", "units_sold", etc.
weather_metric VARCHAR(100), -- "temp_avg", "precipitation", etc.
correlation_coefficient DECIMAL(5,3), -- -1 to +1
lag_days INT, -- days between weather and demand
p_value DECIMAL(10,8), -- statistical significance
sample_size INT, -- number of data points used
r_squared DECIMAL(5,3), -- explained variance
equation VARCHAR(500), -- linear regression: y = ax + b
confidence_interval VARCHAR(100), -- 95% CI: [lower, upper]
time_period_start DATE, -- when data was collected
time_period_end DATE,
last_updated TIMESTAMP,
INDEX (tenant_id, location_id, demand_metric)
);
Use Cases
Use Case 1: Demand Pattern Explanation
User sees: Demand spiked 30% last week
System explains:
"Temperature dropped 8°C (0.87 correlation, p<0.01)
Based on 5-year history: -1.2% demand per °C drop
Expected impact: -9.6%, actual: -10.1%
✓ Aligns with learned correlation"
Use Case 2: What-If Forecasting
Weather forecast: Temperature will rise 3°C next week
System predicts:
"Based on learned correlation (0.72, lag=2 days),
expect +3.6% demand increase in 2-3 days
Confidence: 78% (from 5-year profile)"
Use Case 3: Seasonal Pattern Training
System analyzes:
- January: -22% correlation with temp (winter driving demand)
- July: +18% correlation with temp (air conditioning demand)
- Creates seasonal sub-profiles
MILESTONE 3.2: Economic Indicators & Policy Integration
Strategic Approach
Problem: Supply chains react to policy shifts (FTA, tariffs, sanctions), but these aren't reflected in demand data alone.
Solution: Track policy events and economic indicators, surface their timing relative to supply chain changes.
Free Data Sources
IMF (International Monetary Fund)
- IMF Data API: https://www.imf.org/external/datamapper/api
- Free, no auth required
- Indicators: GDP growth, inflation, exchange rates
- Coverage: 189+ countries
World Bank
- World Bank Open Data: https://data.worldbank.org/
- Indicators: GDP, trade flows, poverty, etc.
- Free, no API key
- 15,000+ indicators
UN COMTRADE (trade data)
- https://comtradeplus.un.org/ (new API)
- Bilateral trade flows
- HS classification system
- Detailed product-level data
- Free access
U.S. Federal Reserve (FRED)
- https://fred.stlouisfed.org/
- US economic data
- Free API with registration
- Exchange rates, interest rates, etc.
USDA WASDE (agricultural)
- World Agricultural Outlook
- Commodity price forecasts
- Free, public data
Recommendation Order:
- IMF (macro indicators, global)
- World Bank (trade flows, development)
- UN COMTRADE (bilateral trade detail)
- FRED (US-specific economic data)
- USDA WASDE (agricultural commodities)
Implementation Plan
Phase 1: Economic Data Ingestion (M3.2.1)
Files to Create:
-
EconomicDataService.js- Fetch economic indicators- Methods:
getIMFIndicators(countries, indicators, startYear, endYear)→ GDP, inflation, exchange ratesgetWorldBankData(indicators, countries)→ development indicatorsgetFREDData(series_id, startDate, endDate)→ US economic data
- Caching to avoid re-fetching
- Data normalization
- Methods:
-
TradeDataService.js- Fetch trade flows- Methods:
getBilateralTrade(exporter, importer, startYear, endYear)→ bilateral trade volumesgetProductTrade(hs_code, countries, years)→ product-level trade
- UN COMTRADE integration
- Trade volume growth rates
- Methods:
-
Database Migrations
- Creates economic_indicators table
- Creates trade_flows table
- Creates policy_events table
Phase 2: Policy Event Tracking (M3.2.2)
Files to Create:
-
PolicyEventService.js- Track policy changes- Manual event entry: FTA signed, tariffs imposed, sanctions, etc.
- Standardized event types: TARIFF_CHANGE, FTA_SIGNED, SANCTIONS, QUOTA_CHANGE, EXCHANGE_RATE_SHOCK
- Link to affected suppliers/regions
- Document impact (tariff rate, effective date, scope)
-
SupplyChainRoutingAnalyzer.js- Connect policy to routing changes- Detects when suppliers/routes change after policy event
- Calculates lead time to change (e.g., "tariff → 45 days → route change")
- Builds routing decision trees
-
Database Migration
- Creates policy_events table
- Creates supplier_routing_history table (tracks supplier destination changes)
Phase 3: Impact Analysis (M3.2.3)
Files to Create:
-
PolicyImpactService.js- Assess policy effects- Methods:
analyzePolicyImpact(policyEvent, historicalData)→ impact magnitude and timingforecaseRoutingChange(policy, supplierProfile)→ predict routing shiftestimateCostImpact(tariff, tradeVolume, alternatives)→ financial impact
- Methods:
-
Integration with Forecast Engine
- Feeds policy-adjusted forecasts to HybridForecastingService
- Shows policy reasoning in forecast explanations
Data Models
-- Economic indicators (IMF, World Bank, FRED)
CREATE TABLE economic_indicators (
id UUID PRIMARY KEY,
tenant_id UUID,
country_code VARCHAR(3),
indicator_code VARCHAR(50), -- "GDP_GROWTH", "INFLATION_RATE", etc.
indicator_name VARCHAR(255),
value DECIMAL(15,4),
unit VARCHAR(50), -- "percent", "USD", etc.
date DATE, -- monthly or annual
source VARCHAR(100), -- "IMF", "World Bank", "FRED", etc.
created_at TIMESTAMP,
INDEX (country_code, indicator_code, date)
);
-- Bilateral trade flows
CREATE TABLE trade_flows (
id UUID PRIMARY KEY,
tenant_id UUID,
exporter_country VARCHAR(3),
importer_country VARCHAR(3),
hs_code VARCHAR(10), -- product code (optional)
trade_value DECIMAL(15,2), -- USD
trade_volume DECIMAL(15,2), -- metric tons (if applicable)
year INT,
source VARCHAR(100), -- "UN COMTRADE", etc.
created_at TIMESTAMP,
INDEX (exporter_country, importer_country, year)
);
-- Policy events and changes
CREATE TABLE policy_events (
id UUID PRIMARY KEY,
tenant_id UUID,
event_type VARCHAR(50), -- "TARIFF_CHANGE", "FTA_SIGNED", "SANCTIONS", etc.
event_date DATE, -- when policy took effect
announced_date DATE, -- when announced
affected_countries VARCHAR(500), -- JSON array
affected_hs_codes VARCHAR(500), -- JSON array (product codes)
description TEXT,
impact_magnitude DECIMAL(5,2), -- e.g., tariff rate: 15%
effective_date DATE,
expiry_date DATE, -- if temporary
source VARCHAR(255), -- URL to official announcement
user_notes TEXT, -- user's analysis
created_by UUID,
created_at TIMESTAMP,
INDEX (event_date, event_type)
);
-- Supplier routing history
CREATE TABLE supplier_routing_history (
id UUID PRIMARY KEY,
tenant_id UUID,
supplier_id UUID,
destination_country VARCHAR(3),
start_date DATE,
end_date DATE,
volume DECIMAL(15,2),
reason VARCHAR(255), -- "TARIFF_AVOIDANCE", "TRADE_AGREEMENT", etc.
policy_event_id UUID REFERENCES policy_events,
created_at TIMESTAMP,
INDEX (supplier_id, destination_country, start_date)
);
Use Cases
Use Case 1: Policy Explains Routing Change
User sees: Supplier ABC switched from Route A → Route C in March
System explains:
"US-China tariff increase (25%) on HS 8471 announced Jan 15
Lead time to reroute: 60 days
Tariff avoidance would save: $2.3M annually
✓ Timing and economics align with routing change"
Use Case 2: Forecast Incorporates Policy Risk
Forecast shows: 12% demand increase Q3
Policy event: India-US FTA negotiations (uncertain outcome)
System flags:
"If FTA signed: +15% demand (improved competitiveness)
If negotiations fail: +9% demand (tariff risk premium)
Current probability: 60% signing, 40% failure"
Use Case 3: Supply Chain Resilience Analysis
User asks: What if US reimpose China tariffs?
System simulates:
- Current sourcing: 45% China exposure
- Alternative routes: Vietnam (12% extra cost), Mexico (8% extra cost)
- Network optimization: Rebalance to 25% China, 40% Vietnam, 35% Mexico
- Financial impact: +$1.8M annually
MILESTONE 3.3: Supply Chain Policy Events Tracking
Real-World Pharma Example
Your pharma client with APIs (Active Pharmaceutical Ingredients) moving across 36 destinations for tax/profit optimization exemplifies the complexity:
Scenario: API manufacturing in India, China, synthesis in Switzerland, formulation in Puerto Rico, EU distribution
Policy Drivers:
- Switzerland GMP requirements → must source from approved suppliers
- China restrictions on chemical precursors → alternate sources
- US tariff on Chinese APIs → reroute through Vietnam or India
- EU duty drawback regulations → optimizes intra-EU moves
- Puerto Rico Act 60 (formerly Act 20) → tax incentives for manufacturing
- India FTA status with EU/UK → tariff preferential treatment
- RCEP (Regional Comprehensive Economic Partnership) → Asian supply chain benefits
System Tracks:
- Policy events triggering routing changes
- Lead time from announcement to implementation (regulatory, customs)
- Financial impact per route scenario
- Compliance requirements per destination
Implementation
M3.3.1: Policy Event Data Model
# Example policy event structure
policy_event = {
"id": "EVT-2024-TARIFF-CHINA-API",
"type": "TARIFF_CHANGE",
"date_announced": "2024-01-15",
"date_effective": "2024-03-01",
"description": "25% additional tariff on chemical imports (HS 2905, 2909, 2924)",
"affected_hs_codes": ["290511", "290919", "292430"], # API codes
"affected_exporters": ["CN"], # China
"affected_importers": ["US"],
"tariff_rate_change": {
"before": "5%",
"after": "30%"
},
"estimated_trade_impact": "$2.3B annually",
"supply_chain_implications": {
"impacted_companies": "Pharma, biotech, specialty chemicals",
"alternative_sources": [
{"country": "IN", "tariff": "5%", "lead_time_weeks": 4, "cost_premium": "8%"},
{"country": "VN", "tariff": "12%", "lead_time_weeks": 6, "cost_premium": "15%"},
{"country": "JP", "tariff": "3%", "lead_time_weeks": 5, "cost_premium": "20%"}
]
}
}
M3.3.2: Routing Optimization Engine
// Routing decision tree for Pharma API
{
"sku": "API-ACETAMINOPHEN",
"baseline_route": "China → Hong Kong → New York",
"routing_rules": [
{
"condition": "US_TARIFF >= 20%",
"action": "route_through_india",
"new_route": "India → Singapore → New York",
"cost_delta": "+8%",
"lead_time_delta": "+7_days"
},
{
"condition": "US_TARIFF >= 25% AND VOLUME > 1000_tons",
"action": "local_contract_manufacturing",
"new_route": "Puerto Rico facility",
"cost_delta": "+12%",
"lead_time_delta": "-14_days",
"tax_benefit": "Act 60 exemption"
},
{
"condition": "EU_DESTINATION AND EU_TARIFF < 5%",
"action": "route_through_switzerland",
"new_route": "India → Switzerland → EU",
"cost_delta": "+15%",
"tariff_benefit": "-8%"
}
]
}
M3.3.3: Compliance & Lead Time Tracking
CREATE TABLE supply_chain_policy_compliance (
id UUID PRIMARY KEY,
tenant_id UUID,
sku_id UUID,
from_country VARCHAR(3),
to_country VARCHAR(3),
compliance_requirement VARCHAR(255), -- "GMP", "CITES", "FDA_APPROVAL", etc.
lead_time_days INT, -- days for compliance (customs, inspections)
documentation_required TEXT, -- certificates, permits
regulatory_body VARCHAR(100), -- "FDA", "EMA", "PMDA", etc.
last_successful_transit TIMESTAMP,
INDEX (sku_id, from_country, to_country)
);
MILESTONE 3.4: Correlation Discovery Engine
Advanced Analytics
Once we have historical weather, policy events, and economic data, we can train models to discover unexpected correlations.
Example Correlations to Discover:
- Weather + Demand: Temperature shifts predict ice cream, heating oil, agricultural demand
- Policy + Inventory: Tariff announcements correlate with supplier inventory changes (6-8 weeks prior)
- Exchange Rates + Sourcing: Currency devaluation → increased export sourcing from that country
- Competitor Actions + Market: Competitor tariff exposure → your market share increases
- Regulatory Changes + Pricing: New safety standards → increased component costs
Implementation
M3.4.1: Time-Series Analysis Service
// Correlation discovery pipeline
{
"input_series": {
"demand": "daily_sales_units",
"weather": "daily_avg_temp",
"policy": "tariff_rate_changes",
"economics": "exchange_rate"
},
"analysis_steps": [
{
"step": "lag_analysis",
"description": "Find optimal lag between cause and effect",
"output": "best_lag_days = 14"
},
{
"step": "correlation_coefficient",
"description": "Pearson correlation with statistical significance",
"output": "r = 0.68, p < 0.01"
},
{
"step": "causality_test",
"description": "Granger causality to test if causation is plausible",
"output": "weather Granger-causes demand at lag 14"
},
{
"step": "interaction_effects",
"description": "Test for combined effects (weather × policy)",
"output": "weather impact is 40% stronger during tariff periods"
},
{
"step": "seasonality_decomposition",
"description": "Separate trend, seasonal, residual components",
"output": "correlation stronger in summer, weaker in winter"
}
]
}
M3.4.2: Explainable AI for Correlations
// Example explanation of discovered correlation
{
"correlation": "Temperature → Ice Cream Demand",
"strength": 0.82,
"confidence": "95% (p < 0.01)",
"lag": "0 days (same-day effect)",
"seasonality": "Much stronger June-August",
"data_points_analyzed": 1825, // 5 years
"explanation": {
"simple": "When temperature rises, ice cream sales increase proportionally",
"detailed": "For every 1°C increase, expect 3.2% sales increase. Effect is non-linear at extremes (>32°C).",
"caveats": [
"Correlation breaks during pandemic lockdowns (2020-2021)",
"Holiday sales override weather patterns (Dec-Jan)",
"Weekend effect: +40% baseline, weather multiplier still applies"
]
},
"forecast_application": {
"weather_next_7_days": [
{"date": "2024-01-22", "temp_forecast": 18, "demand_adjustment": "+5.8%"},
{"date": "2024-01-23", "temp_forecast": 22, "demand_adjustment": "+13.4%"}
]
}
}
MILESTONE 3 Implementation Roadmap
Phase 1: Weather Hindcasting (Weeks 1-3)
- M3.1.1: Weather data collection (Open-Meteo + NOAA)
- M3.1.2: Correlation discovery (Pearson + lag analysis)
- M3.1.3: Weather forecast API routes
- Deliverable: Users can see weather correlations with demand
Phase 2: Economic Data (Weeks 4-5)
- M3.2.1: Economic indicator ingestion (IMF, World Bank)
- M3.2.2: Policy event tracking (manual + API sources)
- M3.2.3: Impact analysis service
- Deliverable: Supply chain policy effects documented and traced
Phase 3: Advanced Analytics (Weeks 6-7)
- M3.4.1: Time-series correlation discovery
- M3.4.2: Explainable AI for correlations
- Multi-variable regression (weather × policy × economics)
- Deliverable: System explains demand movements through external factors
Phase 4: Integration (Week 8)
- Integrate correlations into forecast engine
- Dashboard showing correlation insights
- What-if scenario builder
- Deliverable: Complete external data → forecast pipeline
Technical Decisions
Data Pipeline Strategy
Batch vs Stream:
- Weather: Daily batch (cost-efficient, sufficient for forecasting)
- Economic indicators: Weekly batch (IMF/World Bank publish on schedules)
- Policy events: Manual entry + RSS feed monitoring (ad-hoc nature)
- Trade data: Monthly batch (COMTRADE publishes monthly)
Caching Strategy:
- Weather historical: Cache heavily (never changes)
- Economic indicators: Cache 24 hours (updates infrequently)
- Policy events: No cache (manual entries, real-time importance)
- Trade data: Cache 7 days (monthly updates)
Correlation Thresholds
Statistically Significant:
- p-value < 0.05 (95% confidence)
- Sample size > 50 (at least 50 data points)
- r² > 0.20 (explains at least 20% of variance)
Actionable in Forecast:
- Correlation coefficient > 0.30 (weak but notable)
- p-value < 0.01 (very significant)
- Consistent across 80%+ of time periods
Storage Strategy
Time-Series Data:
- Use PostgreSQL with proper indexing (not timeseries DB initially)
- Index on (tenant_id, location_id, date) for query performance
- Partition by year if data grows large (>10M rows)
Correlation Results:
- Store computed correlations (don't recalculate daily)
- Include metadata: sample size, time period, data quality
- Version correlations (e.g., "Jan-May v1" vs "Jan-May v2 with outlier removal")
Risk Mitigation
Data Quality Risks
Weather Data Gaps:
- Mitigation: Multi-source (Open-Meteo + NOAA) with fallback
- Interpolate missing days using adjacent data
- Flag results based on data quality score
Spurious Correlations:
- Mitigation: Require statistical significance (p < 0.05)
- Separate time periods to avoid trend artifacts
- Use Granger causality to test directionality
- Manual review of surprising correlations
Policy Event Incompleteness:
- Mitigation: User can manually add events
- Monitor trade news APIs/RSS feeds
- Cross-reference with government announcements
Performance Risks
Large Correlation Calculations:
- Mitigation: Pre-compute correlations overnight
- Cache results heavily
- Show "correlation calculated at X time" in UI
API Rate Limits:
- Mitigation: Batch requests during off-peak hours
- Cache aggressively
- Use free tier APIs only (no cost surprises)
Success Metrics
M3.1: Weather Hindcasting ✅ Discover 5+ statistically significant weather-demand correlations ✅ What-if forecast shows 80%+ accuracy vs actual outcomes ✅ Users report insights they didn't know before
M3.2: Policy Integration ✅ Link 100% of supply chain routing changes to policy events within 60-day window ✅ Policy impact estimates within ±15% of actual financial impact ✅ Early warning: flag policy announcements 60+ days before impact
M3.3: Supply Chain Policy Tracking ✅ Comprehensive policy event database (tariffs, FTAs, sanctions) ✅ Routing decision models reflect company-specific rules ✅ Compliance tracking for each supplier-destination pair
M3.4: Correlation Discovery ✅ Discover 20+ correlations across weather, policy, economics, demand ✅ Multi-variable models explain 60%+ of demand variance ✅ Explainable AI shows clear reasoning for forecast adjustments
Conclusion
MILESTONE 3 transforms ChainAlign from a data-driven platform into an insight-driven decision engine by:
- Discovering empirical correlations (don't assume, prove)
- Surfacing policy impacts (regulatory changes → routing changes)
- Enabling scenario modeling (what-if with learned relationships)
- Providing transparent explanations (why external factors matter)
This positions ChainAlign as the correlation discovery engine that companies can't build themselves (because data is proprietary), while remaining flexible enough to work with each company's unique patterns.