Functional Specification Document: Hybrid Forecasting Service
A Python-based microservice for advanced, time-series forecasting using a hybrid residual model architecture.
| Feature Name | Hybrid Forecasting Service |
|---|---|
| Component Layer | Backend AI/ML Service |
| Status | Proposed |
| Core Goal | To replace the existing statistical calculation with a more accurate, robust, and extensible forecasting engine based on modern machine learning techniques and best practices. |
| Primary Metric | Forecast Accuracy (MAPE): Mean Absolute Percentage Error reduction compared to the previous baseline model. |
1. Overview & Strategic Imperative
The existing forecasting service provides a baseline but lacks the sophistication to model complex, non-linear demand patterns. To elevate ChainAlign's predictive capabilities, a new, dedicated forecasting microservice is required. This service will be built in Python to leverage its rich ecosystem of machine learning libraries.
This document specifies the design for the Hybrid Forecasting Service. It is based on the architectural research outlined in "An Architectural Blueprint for Gated, Meta-Cognitive Forecasting Systems." The initial implementation will focus on creating a powerful baseline hybrid model, which will serve as the foundation for future enhancements like LLM-based gating and external data fusion.
The core of this service is the Serial Residual Modeling pattern, which decomposes the time series into linear and non-linear components. This provides a more accurate and interpretable forecast than a single model could achieve alone.
2. System Architecture
2.1. High-Level Data Flow
The Hybrid Forecasting Service will operate as a standalone microservice. The existing Node.js backend will delegate forecasting tasks to it via a REST API.
[ChainAlign Backend (Node.js)]
|
| 1. POST /v1/forecast with time-series data
V
[Hybrid Forecasting Service (Python/FastAPI)]
|
| 2. Process data with Hybrid Engine
| - Fit Prophet model
| - Extract residuals
| - Train XGBoost on residuals
| - Combine forecasts
V
[Response with enriched forecast]
|
| 3. Return JSON forecast to Node.js backend
V
[ChainAlign Backend (Node.js)]
|
| 4. Integrate forecast into decision packages
V
[End User]
2.2. Recommended Python Stack
- Forecasting & Hindcasting:
dartsfor its unified API, diverse models, and robust backtesting. - Statistical Baseline:
prophet(can be called viadarts). - ML Residual Model:
xgboost. - API Framework:
fastapifor high-performance serving. - Data Handling:
pandas. - Data Validation:
pydanticfor API data schemas.
3. Core Component: Hybrid Residual Forecasting Engine
3.1. Architecture
The engine will implement the Serial Residual Model pattern. This is a two-stage process:
- Stage 1: Linear Modeling: A statistical model (Prophet) is fitted to the primary time series data. This model captures the main trend, seasonality, and holiday effects.
- Stage 2: Non-Linear Modeling: The residuals (errors) from the Prophet model are extracted. These residuals represent the complex, non-linear patterns that Prophet could not model. An XGBoost model is then trained to forecast these residuals.
3.2. Logic Flow
For a given forecasting request:
- The input time-series data is converted into a
dartsTimeSeriesobject. - A
Prophetmodel is instantiated and fitted to the time series. - The historical, in-sample residuals are calculated (
actuals - prophet_in_sample_forecast). - An
XGBoostmodel is trained, with the residuals as the target variable. Features for this model can include lagged values of the residuals and any provided covariates. - To generate a future forecast:
a. The Prophet model predicts the future trend and seasonality.
b. The XGBoost model predicts the future residuals.
c. The final forecast is the sum:
Prophet_Forecast + XGBoost_Residual_Forecast.
4. API Specification
The service will be built using FastAPI and will expose the following endpoints. All data models will be defined using Pydantic for validation.
4.1. POST /v1/forecast
This is the primary endpoint for generating a forecast.
-
Request Body:
{
"time_series": [
{"timestamp": "2025-01-01T00:00:00Z", "value": 150.0},
{"timestamp": "2025-01-02T00:00:00Z", "value": 155.5}
],
"forecast_horizon": 30,
"future_covariates": [
{"timestamp": "2025-02-15T00:00:00Z", "is_promotion": 1}
]
} -
Response Body:
{
"forecast": [
{"timestamp": "2025-02-01T00:00:00Z", "value": 180.2},
{"timestamp": "2025-02-02T00:00:00Z", "value": 183.7}
],
"model_components": {
"prophet_forecast": [...],
"xgboost_residual_forecast": [...]
},
"execution_metadata": {
"model_name": "Hybrid Prophet+XGBoost",
"execution_time_ms": 250
}
}
4.2. POST /v1/backtest
This endpoint will be used for model validation and performance tuning (hindcasting).
-
Request Body:
{
"time_series": [...], // Full historical time series
"forecast_horizon": 14,
"backtest_params": {
"strategy": "rolling_window",
"window_size": 157 // e.g., 157 weeks for VN2 data
}
} -
Response Body:
{
"metrics": {
"mape": 18.35,
"rmse": 12.8,
"mae": 9.2
},
"backtest_forecasts": [...], // The series of historical forecasts
"execution_metadata": {
"model_name": "Hybrid Prophet+XGBoost",
"backtest_strategy": "rolling_window",
"execution_time_ms": 5400
}
}
5. Integration Plan
The existing Node.js HybridForecastingService.js will be refactored.
- The internal logic for calculating the statistical baseline will be removed.
- A new function will be added to make an HTTP POST request to the Python service's
/v1/forecastendpoint. - The response from the Python service will be used as the new, more accurate statistical baseline.
- The subsequent steps in the Node.js service (LLM synthesis, blending, etc.) will remain, but will now operate on a much higher quality baseline forecast.
6. Phased Rollout & Future Roadmap
This FSD covers the foundational implementation. Future work will build upon this service.
-
Phase 1 (This FSD):
- Build the Python service with the Prophet+XGBoost hybrid model.
- Create the
/v1/forecastand/v1/backtestendpoints. - Integrate the service into the existing Node.js backend.
-
Phase 2: External Data Fusion:
- Enhance the model to accept
past_covariates. - Build an NLP pipeline using
Hugging Face Transformersto extract sentiment scores from news articles and feed them into the model.
- Enhance the model to accept
-
Phase 3: LLM Gating Agent:
- Implement a
LangChainagent that uses the forecasting service as a "tool". - The agent will reason over the forecast and external news to provide a final, "meta-cognitive" judgment, flagging forecasts for human review when necessary.
- Implement a