DataQuant

FMCG SKU AI Case

From Spreadsheets to SKU-Level AI

How a European confectionery manufacturer lifted forecast accuracy by 17% across 12,400 SKUs — and recovered €5.2M of margin in the process.

EU CONFECTIONERY MANUFACTURER  ·  €420M REVENUE · 12,400 SKUS  ·  ENGAGEMENT: 16 WEEKS

+17%

FORECAST ACCURACY

€5.2M

MARGIN RECOVERED

4.1→2.6%

STOCK-OUT RATE

€11M

WC RELEASED

Illustrative case based on engagement patterns from European FMCG manufacturers. Specific business details, names, and exact figures have been adapted to preserve client confidentiality.

Situation

A €420M European confectionery manufacturer operating across eight countries had a forecasting problem the planning team had been frustrated with for years. The S&OP cycle ran monthly. The operational forecast was built in a 1,200-tab Excel workbook by a team of four senior planners. Forecast accuracy at the 7-day horizon was running at MAPE 22% across the SKU portfolio of 12,400 active items.

The 22% MAPE translated into expensive operational reality. Stock-out rate ran at 4.1% of SKU-region-week combinations — enough to lose meaningful share at retailers who tracked OSA (on-shelf availability) tightly. Conversely, finished-goods inventory ran at 47 days of cover, tying up working capital that the CFO had been pushing to release for two years.

The planning team’s read on the situation: the operational forecast had structural blind spots they could not fix from inside the spreadsheet. Weather effects on weather-sensitive lines (chocolate has a non-linear response to temperature; a hot week tanks demand 15–25%) were captured anecdotally, not modelled. Promotional lift was estimated at the brand level, not at the SKU level. Daily POS data from the largest retailer was arriving in the central data team’s inbox but was not feeding into the planning workbook because no one had built the integration.

The CEO’s read on the situation: forecast accuracy had been a topic at every quarterly business review for three years, with no measurable improvement. The status quo was not the planning team’s fault — they were doing competent work with the tools they had — but the tools had reached the limit of what spreadsheet-based forecasting could deliver.

Diagnosis

The diagnostic phase ran four weeks. Three findings reframed the engagement.

Finding 1 — Weather was the largest unmodelled driver

A historical analysis of the past 36 months of POS data, joined to local weather actuals, showed that temperature alone explained 31% of weekly demand variance on the chocolate, ice-bar, and seasonal-confectionery lines. The operational forecast handled weather through manual planner adjustment — sometimes accurate, often late, never systematic. A modest weather-aware sensing layer would close most of the structural accuracy gap on these lines.

Finding 2 — Promotional lift was structurally mismodelled

Promo lift was being applied to the operational forecast at brand level using historical averages. SKU-level promo response varied by 3–7x across SKUs within the same brand: the hero pack lifted 4x baseline, the variant flavours 1.2x. Treating the brand as homogeneous produced over-forecast on slow-movers (causing waste and markdowns) and under-forecast on hero packs (causing stock-outs at exactly the moment of peak demand).

Finding 3 — The daily POS data was already in the building

Daily SKU-level POS feeds from three major retailers were arriving via SFTP into the central data team’s warehouse. They were used for monthly retailer reviews and almost nothing else. The signal that would have transformed forecast accuracy was sitting unconsumed in a warehouse table the planning team did not have access to. The integration work was small relative to the analytical opportunity.

What the diagnostic showed

The planning team did not need a new ERP. They did not need a planning-platform replacement. They needed a sensing layer that integrated weather, daily POS, and SKU-level promo response — wired into the operational forecast as a daily adjustment. The capability could be built in 12–16 weeks.

Approach

The build phase ran from week 5 through week 16 across three workstreams.

Workstream 1 — Data architecture (weeks 5–9)

Daily SKU-level POS from three retailers (covering 71% of total sell-out) was piped into a single warehouse table at SKU × retailer × region × day grain. Weather data was integrated from a commercial meteorological feed at postal-district granularity. Promotional calendar data was joined from the trade marketing system, with cleanup work to align SKU codes that diverged between systems. The output of this phase was a unified daily-cadence dataset spanning 36 months of history.

Workstream 2 — Sensing model build (weeks 7–12)

A two-layer model architecture was deployed. The first layer was the existing operational forecast (kept in place; the planning team continued to own it). The second layer was a gradient-boosted regressor (LightGBM) trained on 24 months of history that produced a daily-update adjustment to the operational forecast based on: weather actuals, weather forecast, daily POS trajectory versus plan, promo calendar position, day-of-week and seasonality features.

The model was trained at the SKU × region grain for the top 2,000 SKUs by volume; the long tail was handled with a hierarchical fallback model that pooled across similar SKUs to produce reasonable forecasts despite limited individual SKU history.

Validation showed MAPE improvement of 14–18% versus the operational forecast alone, depending on category. Weather-sensitive categories (chocolate, ice-bars) showed the largest improvement; staple categories (hard-boiled sweets, gum) showed smaller gains where the operational forecast was already close.

Workstream 3 — Process integration (weeks 11–16)

The sensing-layer output was wired into three operational systems. Production scheduling pulled the daily sensing-adjusted forecast and adjusted production rates within the constraints of the locked weekly schedule. Replenishment to retailer DCs used the sensing-layer projection for daily rebalancing. A pre-emptive stock-out alert system flagged any SKU × region combination forecast to stock out within 5 days, surfaced to the regional planner for human review before action.

A weekly review cadence was established between supply-chain leadership and the central planning team to monitor model performance, surface unusual signals, and make calibration adjustments. The model was not autonomous — it produced recommendations that humans approved or overrode. This was deliberate. It maintained planner ownership and trust during the rollout.

Outcome

Twelve months after the sensing layer went live, the metrics showed a structural improvement that compounded across categories:

METRIC

BEFORE

AFTER (12 MO)

CHANGE

7-day forecast MAPE

22%

14%

−8 pts

Stock-out rate

4.1%

2.6%

−1.5 pts

Finished-goods inventory days

47 days

39 days

−8 days

Margin recovered (annualised)

€5.2M

Working capital released

€11M

The 17% reduction in MAPE — from 22% to 14% — was the headline analytical outcome, but the more strategically important number was the change in stock-out rate. A 1.5-point reduction across 12,400 SKUs across 8 countries represents thousands of avoided lost-sale events. Two of the major retailer relationships, where OSA had been a recurring complaint, materially improved during the year. The supplier rating with one major retailer moved up two tiers in their internal scoring.

The forecast did not become perfect. It became close enough that operations stopped second-guessing the production schedule — and that change in trust was worth more than the accuracy points themselves.

Lessons

  1. Spreadsheet forecasting has a hard ceiling. The planning team had been competent and diligent for years. They were not the constraint. The constraint was that the tool they were using — a 1,200-tab Excel workbook — could not absorb the daily-cadence signals that were structurally available in the data layer. The capability ceiling was tool-imposed, not skill-imposed.
  2. The signal is usually already in the building. Every component of the sensing layer was data that already existed somewhere in the organisation. The transformation was not in acquiring new data; it was in making existing data accessible and integrated. This pattern is consistent across most FMCG demand-planning engagements we have run.
  3. Hierarchical models for the long tail are essential. For the top 2,000 SKUs, individual-SKU models worked well. For the remaining 10,400 SKUs in the long tail, individual models would have been over-fitted to limited data. Hierarchical fallback to similar-SKU pooled models maintained forecast quality on the long tail without requiring per-SKU calibration.
  4. Keep the planner in the loop. A fully autonomous sensing model that overrode the planning team’s judgment without human review would have produced political backlash and adoption resistance. Maintaining the planner-in-the-loop architecture preserved trust during rollout and protected against the cases where the model was structurally wrong (e.g., during the first weeks of an unprecedented promotional event).
  5. Working capital release is the largest financial outcome. €11M of working capital release is materially larger than the €5.2M of margin recovered — and it is structural. It accumulates from the moment the inventory comes down and persists for as long as the forecast accuracy holds. Frame demand-sensing business cases against working capital impact, not just margin.

Related Case Studies

Markdown Case

How a €620M European apparel retailer recovered €7.8M in margin across 320 stores by replacing calendar-driven markdowns with AI-powered sell-through optimisation and store-cluster pricing intelligence.

Read Case Study →

Working Capital Case

From 67 to 49 Days How a €450M EU industrial distributor released €18M of trapped working capital — and reframed the CFO’s strategic options for

Read Case Study →

Parcel Contract Case

The Parcel Contract Rewrite How a €15M EU carrier programme produced 22% spend reduction — and €3.4M of recurring annual savings — through line-item audit

Read Case Study →