Back to Blog · Software Architecture

Building a Weekly Meal Pipeline: Scraper, LLM Vision, and Automated Meal Plans

How I built an automated pipeline that scrapes 3 grocery circulars, analyzes 44 images with Gemini, generates Ricardo-style weekly menus, and pushes everything to a web dashboard — saving hours every week.

MF
Martin Fournier
· May 31, 2026 · 7 MIN READ
Illustration for: Building a Weekly Meal Pipeline: Scraper, LLM Vision, and Automated Meal Plans

Building a Weekly Meal Pipeline: Scraper, LLM Vision, and Automated Meal Plans

Every week, a scheduled task kicks off on a personal server. Within minutes, it produces a complete weekly menu: seven dinners, seven healthy desserts, a grocery list organized by store, and price comparisons showing exactly how much can be saved on sale items.

This is the story of how I built that pipeline — and the lessons learned along the way.

The Problem

Grocery shopping for a family of six is expensive and chaotic. Quebec has three major grocery chains (Maxi, Super C, IGA), each publishing weekly circulars in their own proprietary format. Manually browsing 40+ pages of flyers every week, deciding what to buy, planning meals around sale items, and writing a grocery list takes about 45-60 minutes.

The goal: automate the entire workflow.

The Architecture

The project is called panier-fute (French for "smart basket" — ~3,900 lines of Python across 20 modules). It integrates with hermes-web (a Laravel dashboard) for display and meal planning.

panier-fute/
├── scraper.py          # httpx + BeautifulSoup → 14-16 pages/store
├── image_pipeline.py   # Download and cache circular images
├── gemini_vision.py    # Gemini 2.5 Flash → structured JSON
├── menu_generator.py   # 485 lines of meal plan generation logic
├── delivery.py         # Discord + hermes-web formatting
├── models.py           # SQLite models (items, price_history, etc.)
├── stores.py           # Store configs (URLs, selectors)
├── config.py           # Cooking profile, API keys, DB paths
├── cli.py              # CLI entry point (565 lines)
└── main.py             # Pipeline orchestrator

Plus hermes-web on the other end, with:

  • Meal plan management (weekly slots per day)
  • Public grocery list with swipe gestures, store organization, shareable tokens
  • Price history tracking with sale/regular price comparison

Step 1: Scraping — httpx + BeautifulSoup

The scraper was rewritten twice. The first version used Playwright (a headless browser) to render circular pages. It crashed constantly with EPIPE errors and took 15+ minutes per store.

The second version uses httpx + BeautifulSoup — plain HTTP requests with HTML parsing. No JavaScript required, because circular image URLs are embedded directly in the HTML.

async def scrape_store(self, store: StoreConfig) -> List[FlyerPage]:
    # Fetch the main circular page
    resp = await client.get(store.url)
    soup = BeautifulSoup(resp.text, "html.parser")

    # Extract all flyer page links (14-16 pages per store)
    flyer_links = [
        a["href"] for a in soup.select("a[href*=flyer]")
    ]

    # Download each page's image
    for link in flyer_links:
        page_resp = await client.get(link)
        page_soup = BeautifulSoup(page_resp.text, "html.parser")
        img_url = page_soup.select_one("img.flyer-page")["src"]
        await download_image(img_url)

Result: 3x faster, zero crashes, from 15 minutes to 4-5 minutes for all three stores.

Step 2: Vision — Gemini 2.5 Flash on 44 Images

Each store publishes 14-16 circular pages, totaling ~44 images per week. Each image contains 15-30 grocery items with prices, promotions, and unit pricing.

The prompt engineering evolved significantly:

PROMPT = """
Extract grocery items from this Quebec grocery circular image.
For each item, return:
- name (French, exact from the image)
- category (meat|fish|fruits|vegetables|dairy|pantry|frozen)
- sale_price, regular_price
- unit_price, unit_type (kg, lb, barquette, ml, sac, botte, etc.)
- weight_or_volume if visible
- is_member_price (is this a loyalty-card-only price?)
- store_brand if it's a store brand

Rules:
- Prices are in CAD
- TPS=5%, TVQ=9.975% for reference
- Use natural packaging units, NOT always $/kg
- e.g. yogourt = $/barquette, lait = $/sac or $/L
"""

The prompt evolved through iterations. Early versions used vague instructions that led to inconsistent results — items being missed, units in the wrong format. The key improvements:

  1. Category filtering: Only extract fruits, vegetables, meat/fish, and dairy (the categories we actually cook with)
  2. Dual price extraction: Ask for both regular_price and sale_price explicitly — flyers almost always show both
  3. Natural units: No "forcing everything to $/kg" — 473ml stays 473ml
  4. Weight extraction: Parse weight/volume from item names for unit price calculation

Each store run consumes ~6,000 tokens in, ~20,000 tokens out via Gemini 2.5 Flash — costing about $0.03 per week.

Step 3: Menu Generation — Ricardo × Weissman

The menu generator is the most complex module (485 lines). It takes sale items, a cooking profile, and constraints, and generates seven days of dinners.

The cooking profile defines the culinary identity:

STYLE CULINAIRE : Ricardo × Joshua Weissman
Ricardo (approachable, family-friendly, structured, Quebecois pantry)
meets Joshua Weissman (high-energy, technique-driven, "better than takeout").

Chef inspirations:
  — Sam the Cooking Guy: casual restaurant-quality comfort food
  — Jean-Francois Plante: bistro quebecois, sauces et braises
  — J. Kenji Lopez-Alt: culinary efficiency, 30 min, practical techniques
  — Genevieve O'Gleman / Savourer: structured, healthy, ultra-efficient
  — Brian Lagerstrom: clean, precise, weeknight elite under 45 min
  — Nagi Maehashi / RecipeTin Eats: foolproof, asian/mex/tray-bakes

Keywords: weeknight-efficient, high-flavor, elevated comfort food,
          sale-item driven, minimum cleanup, meal-prep friendly

Family: 2 adults + 4 teenagers + leftovers for lunches

Each recipe includes:

  • Ingredients with exact quantities
  • Step-by-step instructions with prep/cook times
  • Pricing: total cost + per-portion cost
  • Nutrition: calories, protein, carbs, fat
  • Store mapping: which items come from which store

Dinners are designed to share ingredients across the week to minimize waste. Leftovers are deliberately oversized for next-day lunches.

Step 4: Price Intelligence — Sale vs Regular

Every flyer page shows two prices per item: the regular price (often struck through or in smaller text) and the sale price (large, highlighted). The Gemini prompt asks for both explicitly.

# Part of the Gemini extraction prompt (simplified)
return [
    {"produit": "Poitrine de poulet", "prix_regulier": 12.99, "prix_special": 8.99},
    {"produit": "Yogourt 4%",         "prix_regulier": 6.49,  "prix_special": 3.99},
]

Both prices are extracted directly from the flyer image and stored in SQLite:

items table:
  produit          | prix_regulier | prix_special
  -----------------|---------------|-------------
  Poitrine poulet  | 12.99         | 8.99
  Yogourt 4%       | 6.49          | 3.99

When the grocery list is generated, _inject_regular_prices() cross-references each item against the DB to ensure both prices carry through to the final output:

def _inject_regular_prices(grocery_items, db):
    """
    Cross-reference each grocery item against the DB prices
    from the Gemini flyer extraction. Both regular and sale
    prices come from the flyer — this ensures they're carried
    through to the final grocery list.
    """
    for item in grocery_items:
        db_item = db.lookup(item["name"])
        if db_item:
            item["regular_price"] = db_item["prix_regulier"]  # from flyer
            item["sale_price"]    = db_item["prix_special"]    # from flyer

Unit prices use natural packaging units directly from the flyer:

  • Yoghurt: $/barquette (not $/kg)
  • Milk: $/sac or $/L
  • Meat: $/kg or $/lb (whichever is on the package)
  • Produce: $/botte, $/sac, or $/unite

Step 5: Delivery — Push to Web Dashboard

Once the menu and grocery list are generated, they're pushed to a web dashboard via a REST API:

The web dashboard then displays everything:

  • Meals tab: Week view with dinner + dessert per day
  • Grocery tab: Items organized by store (color-coded), with swipe-to-toggle and swipe-to-delete
  • Public grocery list: Shareable via secret token URL — no login required, mobile-first

The Cron Job

Everything runs automatically on a weekly schedule — no manual intervention needed.

The pipeline uses two flags: --first-last to only process the first and last pages of each circular (enough to see the best deals without processing all 44 images), and --categories to filter to produce, meat, fish, and dairy only.

Cost Breakdown

Component Cost per week Annual
Gemini 2.5 Flash API ~$0.03 ~$1.56
Scraper (httpx) $0.00 $0.00
SQLite (local) $0.00 $0.00
hermes-web hosting ~$5/mo ~$60
Total ~$62/year

Compare this to the 45-60 minutes per week I used to spend — that's 39-52 hours saved per year. At any consulting rate, the ROI is absurd.

What I'd Do Differently

  1. Skip Playwright from day one. The circular pages don't need JavaScript rendering. Plain HTTP + HTML parsing is faster, simpler, and crash-proof.

  2. Refine the Gemini prompt earlier. The first few weeks had inconsistent extraction — items being missed, units in wrong formats. Getting the prompt right from the start would have saved debugging time.

  3. Batch Gemini calls. Each image is processed independently. Batching 3-4 pages per call would cut costs and latency.

  4. Proper image preprocessing. Some circular images have poor contrast or skewed angles. A simple OpenCV pipeline (deskew, contrast stretch) would improve extraction accuracy.

What's Next

  • Recipe image generation: Using Gemini Imagen 4.0 to auto-generate food photos for each recipe
  • Cross-week optimization: Track what's in the pantry week-over-week to reduce waste
  • Nutrition targets: Adjust menus based on family health goals (calories, protein, etc.)
  • Multi-language support: The pipeline is Quebec-specific now, but the architecture generalizes to any region with circular flyers


Built with Python, Gemini, and way too many grocery flyers.