Back to Blog · AI & Automation

Building a Weekly Meal Pipeline

How I built an automated pipeline that scrapes 3 grocery circulars, analyzes 44 images with Gemini, generates weekly menus, and pushes to a web dashboard.

MF
Martin Fournier
· May 25, 2026 · 7 MIN READ
Illustration for: Building a Weekly Meal Pipeline

Building a Weekly Meal Pipeline: Scraper, LLM Vision, and Automated Meal Plans

Every week, a scheduled task kicks off on a personal server. Within minutes, it produces a complete weekly menu: seven dinners, seven healthy desserts, a grocery list organized by store, and price comparisons showing exactly how much can be saved on sale items.

This is the story of how I built that pipeline — and the lessons learned along the way.

The Problem

Grocery shopping for a family of six is expensive and chaotic. Quebec has three major grocery chains (Maxi, Super C, IGA), each publishing weekly circulars in their own proprietary format. Manually browsing 40+ pages of flyers every week, deciding what to buy, planning meals around sale items, and writing a grocery list takes about 45-60 minutes.

The goal: automate the entire workflow.

The Architecture

The project is called panier-fute (French for "smart basket" — ~3,900 lines of Python across 20 modules). It integrates with hermes-web (a Laravel dashboard) for display and meal planning.

panier-fute/
├── scraper.py          # httpx + BeautifulSoup → 14-16 pages/store
├── image_pipeline.py   # Download and cache circular images
├── gemini_vision.py    # Gemini 2.5 Flash → structured JSON
├── menu_generator.py   # 485 lines of meal plan generation logic
├── delivery.py         # Discord + hermes-web formatting
├── models.py           # SQLite models (items, price_history, etc.)
├── stores.py           # Store configs (URLs, selectors)
├── config.py           # Cooking profile, API keys, DB paths
├── cli.py              # CLI entry point (565 lines)
└── main.py             # Pipeline orchestrator

Plus hermes-web on the other end, with:

  • Meal plan management (weekly slots per day)
  • Public grocery list with swipe gestures, store organization, shareable tokens
  • Price history tracking with sale/regular price comparison

Step 1: Scraping — httpx + BeautifulSoup

The scraper was rewritten twice. The first version used Playwright (a headless browser) to render circular pages. It crashed constantly with EPIPE errors and took 15+ minutes per store.

The second version uses httpx + BeautifulSoup — plain HTTP requests with HTML parsing. No JavaScript required, because circular image URLs are embedded directly in the HTML.

async def scrape_store(self, store: StoreConfig) -> List[FlyerPage]:
    # Fetch the main circular page
    resp = await client.get(store.url)
    soup = BeautifulSoup(resp.text, "html.parser")

    # Extract all flyer page links (14-16 pages per store)
    flyer_links = [
        a["href"] for a in soup.select("a[href*=flyer]")
    ]

    # Download each page's image
    for link in flyer_links:
        page_resp = await client.get(link)
        page_soup = BeautifulSoup(page_resp.text, "html.parser")
        img_url = page_soup.select_one("img.flyer-page")["src"]
        await download_image(img_url)

Result: 3x faster, zero crashes, from 15 minutes to 4-5 minutes for all three stores.

Step 2: Vision — Gemini 2.5 Flash on 44 Images

Each store publishes 14-16 circular pages, totaling ~44 images per week. Each image contains 15-30 grocery items with prices, promotions, and unit pricing.

The prompt engineering evolved significantly:

PROMPT = """
Extract grocery items from this Quebec grocery circular image.
For each item, return:
- name (French, exact from the image)
- category (meat|fish|fruits|vegetables|dairy|pantry|frozen)
- sale_price, regular_price
- unit_price, unit_type (kg, lb, barquette, ml, sac, botte, etc.)
- weight_or_volume if visible
- is_member_price (is this a loyalty-card-only price?)
- store_brand if it's a store brand

Rules:
- Prices are in CAD
- TPS=5%, TVQ=9.975% for reference
- Use natural packaging units, NOT always $/kg
- e.g. yogourt = $/barquette, lait = $/sac or $/L
"""

The first attempt used generic prompts and got hallucinated prices — items that don't exist, prices that don't match. The key improvements:

  1. Category filtering: Only extract fruits, vegetables, meat/fish, and dairy (the categories we actually cook with)
  2. Real price cross-referencing: Prices from Gemini are marked as price_is_estimated=true and overwritten with actual DB prices when available
  3. Natural units: No "forcing everything to $/kg" — 473ml stays 473ml
  4. Weight extraction: Parse weight/volume from item names for unit price calculation

Each store run consumes ~6,000 tokens in, ~20,000 tokens out via Gemini 2.5 Flash — costing about $0.03 per week.

Step 3: Menu Generation — Ricardo × Weissman

The menu generator is the most complex module (485 lines). It takes sale items, a cooking profile, and constraints, and generates seven days of dinners.

The cooking profile defines the culinary identity:

STYLE CULINAIRE : Ricardo × Joshua Weissman
Ricardo (approachable, family-friendly, structured, Quebecois pantry)
meets Joshua Weissman (high-energy, technique-driven, "better than takeout").

Chef inspirations:
  — Sam the Cooking Guy: casual restaurant-quality comfort food
  — Jean-Francois Plante: bistro quebecois, sauces et braises
  — J. Kenji Lopez-Alt: culinary efficiency, 30 min, practical techniques
  — Genevieve O'Gleman / Savourer: structured, healthy, ultra-efficient
  — Brian Lagerstrom: clean, precise, weeknight elite under 45 min
  — Nagi Maehashi / RecipeTin Eats: foolproof, asian/mex/tray-bakes

Keywords: weeknight-efficient, high-flavor, elevated comfort food,
          sale-item driven, minimum cleanup, meal-prep friendly

Family: 2 adults + 4 teenagers + leftovers for lunches

Each recipe includes:

  • Ingredients with exact quantities
  • Step-by-step instructions with prep/cook times
  • Pricing: total cost + per-portion cost
  • Nutrition: calories, protein, carbs, fat
  • Store mapping: which items come from which store

Dinners are designed to share ingredients across the week to minimize waste. Leftovers are deliberately oversized for next-day lunches.

Step 4: Price Intelligence — Sale vs Regular

This was the hardest part. Gemini often hallucinates sale prices ("$3.99" when the real price is "$5.49"). The solution: a two-tier price system.

def _inject_regular_prices(grocery_items, db):
    """
    Cross-reference each item against the SQLite price_history table.
    If we've seen this item before at a different price, mark the
    Gemini-estimated price as estimated and use the real historical price.
    """
    for item in grocery_items:
        historical = db.get_price(item["name"], item["store"])
        if historical:
            item["regular_price"] = historical.regular_price
            item["sale_price"] = item["sale_price"]  # from circular = real
            item["price_is_estimated"] = False
        else:
            item["price_is_estimated"] = True  # Gemini gave us this
    return grocery_items

Items are displayed with a badge: verified prices vs. estimated prices. Over multiple weeks, the database builds up real price data and estimates become rarer.

Unit prices use natural packaging units:

  • Yoghurt: $/barquette (not $/kg)
  • Milk: $/sac or $/L
  • Meat: $/kg or $/lb (whichever is on the package)
  • Produce: $/botte, $/sac, or $/unite

Step 5: Delivery — Push to Web Dashboard

Once the menu and grocery list are generated, they're pushed to a web dashboard via a REST API:

The web dashboard then displays everything:

  • Meals tab: Week view with dinner + dessert per day
  • Grocery tab: Items organized by store (color-coded), with swipe-to-toggle and swipe-to-delete
  • Public grocery list: Shareable via secret token URL — no login required, mobile-first

The Cron Job

Everything runs automatically on a weekly schedule — no manual intervention needed.

The pipeline uses two flags: --first-last to only process the first and last pages of each circular (enough to see the best deals without processing all 44 images), and --categories to filter to produce, meat, fish, and dairy only.

Cost Breakdown

| Component | Cost per week | Annual | |-----------|--------------|--------| | Gemini 2.5 Flash API | ~$0.03 | ~$1.56 | | Scraper (httpx) | $0.00 | $0.00 | | SQLite (local) | $0.00 | $0.00 | | hermes-web hosting | ~$5/mo | ~$60 | | Total | | ~$62/year |

Compare this to the 45-60 minutes per week I used to spend — that's 39-52 hours saved per year. At any consulting rate, the ROI is absurd.

What I'd Do Differently

  1. Skip Playwright from day one. The circular pages don't need JavaScript rendering. Plain HTTP + HTML parsing is faster, simpler, and crash-proof.

  2. Add price history earlier. The first few weeks had completely hallucinated prices. Real DB cross-referencing should be the default, not an afterthought.

  3. Batch Gemini calls. Each image is processed independently. Batching 3-4 pages per call would cut costs and latency.

  4. Proper image preprocessing. Some circular images have poor contrast or skewed angles. A simple OpenCV pipeline (deskew, contrast stretch) would improve extraction accuracy.

What's Next

  • Recipe image generation: Using Gemini Imagen 4.0 to auto-generate food photos for each recipe
  • Cross-week optimization: Track what's in the pantry week-over-week to reduce waste
  • Nutrition targets: Adjust menus based on family health goals (calories, protein, etc.)
  • Multi-language support: The pipeline is Quebec-specific now, but the architecture generalizes to any region with circular flyers


Built with Python, Gemini, and way too many grocery flyers.