Behind the Build: How We Achieved 92% Forecast Accuracy for ShopSense

This is the technical companion to our ShopSense Analytics case study. The case study tells the business story — 92% forecast accuracy, $1.8M saved, inventory visibility across five warehouses. This post tells the engineering story: how we designed the system, why we made the choices we did, and what we'd change next time.

The Starting Point

ShopSense had tried an off-the-shelf forecasting tool that plateaued at 68% accuracy. When we dug into why, we found three factors the generic tool couldn't model:

Promotional cannibalization — when Product A goes on sale, it steals demand from similar Product B
Regional weather sensitivity — certain seasonal products sold dramatically differently across warehouse regions
Supplier reliability patterns — inconsistent lead times from certain suppliers meant "in stock" didn't always mean "available to ship"

A single time-series model couldn't capture all three. We needed an ensemble.

Architecture Overview

The production system has four layers:

Data ingestion — Apache Airflow orchestrates daily ETL jobs that pull sales transactions, inventory levels, promotional calendars, weather data (OpenWeatherMap API), and supplier delivery records into a unified data warehouse (PostgreSQL on RDS).

Feature engineering — A Python pipeline transforms raw data into model-ready features: rolling sales averages, promotional flags, weather indices, day-of-week patterns, cross-product correlation scores, and supplier reliability ratings.

Forecasting models — A two-layer ensemble:

Layer 1: Facebook Prophet handles trend, seasonality, and holiday effects for each SKU
Layer 2: LightGBM captures residual patterns — the cross-product, promotional, and weather effects that Prophet can't model

Serving layer — A Next.js dashboard queries the forecast database and renders real-time inventory views, purchase recommendations, and what-if simulation results.

Why Prophet + LightGBM

We evaluated five approaches:

| Model | Accuracy | Training Time | Interpretability | Notes | |-------|----------|---------------|-----------------|-------| | ARIMA | 71% | Fast | High | Couldn't handle multiple seasonality patterns | | Prophet | 79% | Fast | High | Good baseline, missed cross-product effects | | DeepAR | 83% | Slow | Low | Better accuracy but black-box, needed more data | | N-BEATS | 81% | Medium | Medium | Similar to DeepAR, harder to productionize | | Prophet + LightGBM | 92% | Medium | High | Best of both worlds |

The ensemble won because:

Prophet is excellent at decomposing time series into trend + seasonality + holidays. It's fast, interpretable, and handles missing data gracefully.
LightGBM captures the tabular features (cross-product correlations, weather, promotions) that time series models miss. It trains fast and is highly interpretable with SHAP values.
Together, Prophet handles the "when" patterns (time-based) and LightGBM handles the "why" patterns (causal factors).

The Data Pipeline

The Airflow DAG runs nightly at 2 AM:

Extract — Pull yesterday's sales from ShopSense's Shopify-based backend, inventory snapshots from their WMS, weather data from OpenWeatherMap, and supplier delivery records from their ERP
Transform — Clean, deduplicate, and compute features. The feature engineering step produces ~45 features per SKU per day.
Predict — Run the ensemble for the next 28 days at daily granularity for each of the 12,000 SKUs
Load — Write forecasts to the serving database, trigger alerts for stockout risks and reorder recommendations

Total pipeline runtime: ~45 minutes for 12,000 SKUs. Retrain cycle: weekly on a rolling 18-month window.

Key Decision: Accuracy vs. Freshness

We hit 94% accuracy in offline evaluation but deliberately backed off to a 92% configuration in production. Here's why:

The last 2% of accuracy came from supplier reliability features that only updated quarterly (when ShopSense renegotiated contracts). In production, these features were stale for up to 3 months between updates.

A model that's 92% accurate with real-time features beats a 94% model with stale features. We chose the configuration that was most reliable over time, not the one that scored highest on the test set.

This is a critical lesson: offline accuracy ≠ production accuracy. Your evaluation should simulate production conditions, including feature freshness.

The Dashboard

The forecasting engine produces numbers. The dashboard turns them into decisions.

Every view answers a specific question:

Reorder view: "What should I order this week, from which supplier, in what quantity?" — pre-computed recommendations based on forecasts, current stock, lead times, and warehouse capacity
Risk view: "Which SKUs are at risk of stockout in the next 14 days?" — sorted by revenue impact
Simulation view: "What happens to inventory if we run a 30% promotion next month?" — runs the forecasting ensemble with modified promotional features

The simulation view was the feature that drove adoption. The planning team didn't fully trust the system until they could test scenarios themselves and see the model's reasoning.

What We'd Build Differently

Simulation mode earlier. We built it in month three. It should have been in the MVP. The planning team's trust came from testing scenarios, not from looking at accuracy metrics.

Real-time event handling. The current pipeline is batch (nightly). When a surprise promotion or supply disruption happens, forecasts don't update until the next day. We'd add a streaming layer (Kafka) for real-time forecast adjustments on high-impact events.

Automated anomaly detection on forecast performance. Currently, we monitor accuracy weekly. We should alert immediately when forecast confidence drops below a threshold for any SKU category — often a signal that something in the market has changed.

Results

The system has been in production for 8 months:

92% demand forecast accuracy (up from 68% with the previous tool)
30% reduction in overstock carrying costs (~$630K annualized savings)
$1.8M estimated savings from reduced stockouts in the first year
Weekly automated purchase recommendations accepted by the planning team 87% of the time without modification

Read the full ShopSense case study →

Building a forecasting system? Let's talk →