Back to Blog · Software Architecture

From Circular to Alert: Building a Grocery Price Intelligence Pipeline in Laravel

How a Laravel app scrapes weekly grocery circulars, normalizes prices into a historical time series, predicts optimal buy moments with linear regression on SQLite, and fires Telegram alerts when a staple hits its floor price.

MF
Martin Fournier
· June 04, 2026 · 6 MIN READ
Illustration for: Panier Futé: A Python Pipeline That Scrapes Grocery Flyers With Gemini Vision

From Circular to Alert: Building a Grocery Price Intelligence Pipeline in Laravel

Every Saturday morning the same question: is this a good price or am I getting fleeced? Grocery prices swing wildly week to week. Milk bounces between $4.79 and $6.29. Chicken breast floats from $11/kg to $17/kg. Without data, you are guessing.

So I built a price intelligence pipeline. It scrapes weekly circulars, normalizes store-specific product names into canonical items, tracks every price in a SQLite time series, runs a linear regression to flag the bottom of the cycle, and fires a Telegram alert when the buy signal triggers. This post walks through the architecture, the schema decisions, and the SQL that makes it fast on a single file.

The Data Model

Three tables carry the weight:

CREATE TABLE price_history (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    store_id INTEGER NOT NULL REFERENCES grocery_stores(id),
    product_name TEXT NOT NULL,
    canonical_item TEXT NOT NULL,
    price_cents INTEGER NOT NULL,
    unit TEXT,
    unit_price_cents INTEGER,
    week_start DATE NOT NULL,
    source_url TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE price_stats (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    canonical_item TEXT NOT NULL UNIQUE,
    min_price_cents INTEGER NOT NULL,
    max_price_cents INTEGER NOT NULL,
    avg_price_cents REAL NOT NULL,
    median_price_cents INTEGER NOT NULL,
    stddev_cents REAL,
    sample_count INTEGER NOT NULL,
    last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE stock_up_alerts (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    canonical_item TEXT NOT NULL,
    threshold_percent REAL NOT NULL DEFAULT 20,
    is_active BOOLEAN DEFAULT TRUE,
    last_triggered_at TIMESTAMP
);

Key decisions:

Cents everywhere. Never store DECIMAL or FLOAT for money in SQLite. A float 5.99 accumulates rounding errors across 52 weeks of averages. Integer cents are exact, fast to sum, and trivially formatted back with a CAST at render time.

Canonical item normalization. Metro calls it Lait 2% 4L, Maxi calls it Milk 2% M.F. 4L, Walmart writes Great Value 2% milk 4L. A lookup table of patterns maps these to a single canonical key. Without this, the time series fragments and the prediction degrades into noise.

Precomputed stats table. Running MIN, MAX, AVG, STDDEV over 10,000+ price observations every time the dashboard loads is wasteful. The price_stats table refreshes after each import via a queued job. The dashboard reads one row per item.

The Import Pipeline

The pipeline runs daily via Laravel scheduler and has four stages.

Stage 1: Scrape

Each grocery chain publishes its weekly flyer as a PDF or JSON endpoint. A dedicated scraper class per chain parses the content and extracts store_id, product_name, price, unit, and week_start. PDF flyers go through a PDF-to-text pass first; JSON flyers parse directly.

The scrapers emit a uniform PriceObservation DTO.

Stage 2: Normalize

A PriceNormalizer service runs the raw product name through a chain of pattern matchers:

class PriceNormalizer
{
    public function normalize(string $productName): string
    {
        return Str::of($productName)
            ->lower()
            ->replace(['m.f.', 'm.f', 'milk fat'], '')
            ->replace('/MILK|LAIT/i', 'milk')
            ->trim()
            ->toString();
    }
}

This is the most brittle stage, but after tuning for two weeks it catches about 85 percent of products. The remaining 15 percent land in a moderation queue in Filament where I assign the canonical name once. Every manual assignment feeds back into the normalizer rules.

Stage 3: Upsert

Prices batch-insert via upsert with a unique composite key on (canonical_item, store_id, week_start):

PriceHistory::upsert(
    $records,
    uniqueBy: ['canonical_item', 'store_id', 'week_start'],
    update: ['price_cents', 'unit_price_cents']
);

Duplicate protection is critical. The same circular sometimes appears twice from a scraper rerun or PDF reprocess. Without upsert you get phantom price swings.

Stage 4: Recompute Stats

After import, a dispatched job recomputes the price_stats table:

INSERT OR REPLACE INTO price_stats
SELECT
    canonical_item,
    MIN(price_cents),
    MAX(price_cents),
    AVG(price_cents),
    MEDIAN(price_cents),
    STDDEV(price_cents),
    COUNT(*),
    CURRENT_TIMESTAMP
FROM price_history
GROUP BY canonical_item;

SQLite has MEDIAN and STDDEV as aggregate functions. This keeps the query refreshingly simple.

Price Prediction (It Is Cheap Enough)

The dashboard includes a simple price predictor: given 12 weeks of history for an item, fit a linear regression price = a * week_index + b and project the expected price for next week. If the current price is significantly below the projected price, the item is trending down and worth buying now.

The implementation is a single SQL query with window functions and aggregate math. SQLite computes it in under 2 milliseconds on 5,000 rows. That is the right level of sophistication for a grocery alert system.

Telegram Alert Integration

When price_stats shows a current price within threshold_percent of the historical min, and last_triggered_at is more than 7 days ago to avoid spam, a Laravel notification fires:

class StockUpAlert extends Notification
{
    public function via($notifiable): array
    {
        return [TelegramChannel::class];
    }

    public function toTelegram($notifiable): TelegramMessage
    {
        return TelegramMessage::create()
            ->to(config('services.telegram.chat_id'))
            ->content('Stock up alert: ' . $this->item . ' at ' . $this->formatPrice());
    }
}

The notification includes the buy window estimate based on how long prices historically stayed at the floor before bouncing.

What the Data Reveals After Two Months

Running this for eight weeks uncovered patterns I suspected but never quantified:

Chicken breast cycles every 3-4 weeks. Price hits $11/kg, stays low for 5-7 days, then bounces to $15+. The buy window is narrow. Miss it and you wait a month.

Milk is bimodal. Two price tiers with almost nothing in between. The cheaper tier appears at different stores on different weeks. Buy when any store in your area hits the low tier.

Store-brand staples are surprisingly flat. Pasta, canned tomatoes, sugar. Within 10 percent variance year round. No point watching them.

Produce is too volatile to predict. Lettuce can be $1.49 one week and $4.99 the next due to weather. The model flags these as anomalies rather than buy signals. The system suppresses produce warnings automatically.

One SQLite Tradeoff Worth Mentioning

Computing STDDEV and MEDIAN over millions of rows would be slow. But this dataset is bounded: at most 52 weeks times 200 items times 5 stores equals about 52,000 rows per year. SQLite handles that aggregate in single-digit milliseconds with no indexes beyond the primary key. The bounded dataset is what makes the simple approach viable.

If this scaled to 10 million rows, I would move the precomputation to a nightly materialized view or switch to PostgreSQL. But it will not reach 10 million rows in this application, and optimizing for a threshold it will never cross is engineering theatre.

Closing

A price intelligence pipeline does not need a data lake, a streaming platform, or a vector database. It needs three SQLite tables, a scheduled console command, a Telegram bot token, and a willingness to spend an afternoon writing pattern matchers for product names. The architecture is boring on purpose. The value is in the data it collects, not the infrastructure it runs on.

The full source is in the health-dashboard repository. The price predictor is 40 lines of SQL. The Telegram integration is one notification class. The hardest part was mapping Lait 2% M.F. 4L to 2% Milk 4L. That is a fine summary of most useful software: the architecture is the easy part; the domain model is the work.