Building a Weekly Meal Pipeline
How I built an automated pipeline that scrapes 3 grocery circulars, analyzes 44 images with Gemini, generates weekly menus, and pushes to a web dashboard.
Building a Weekly Meal Pipeline: Scraper, LLM Vision, and Automated Meal Plans
Every week, a scheduled task kicks off on a personal server. Within minutes, it produces a complete weekly menu: seven dinners, seven healthy desserts, a grocery list organized by store, and price comparisons showing exactly how much can be saved on sale items.
This is the story of how I built that pipeline — and the lessons learned along the way.
The Problem
Grocery shopping for a family of six is expensive and chaotic. Quebec has three major grocery chains (Maxi, Super C, IGA), each publishing weekly circulars in their own proprietary format. Manually browsing 40+ pages of flyers every week, deciding what to buy, planning meals around sale items, and writing a grocery list takes about 45-60 minutes.
The goal: automate the entire workflow.
The Architecture
The project is called panier-fute (French for "smart basket" — ~3,900 lines of Python across 20 modules). It integrates with hermes-web (a Laravel dashboard) for display and meal planning.
panier-fute/
├── scraper.py # httpx + BeautifulSoup → 14-16 pages/store
├── image_pipeline.py # Download and cache circular images
├── gemini_vision.py # Gemini 2.5 Flash → structured JSON
├── menu_generator.py # 485 lines of meal plan generation logic
├── delivery.py # Discord + hermes-web formatting
├── models.py # SQLite models (items, price_history, etc.)
├── stores.py # Store configs (URLs, selectors)
├── config.py # Cooking profile, API keys, DB paths
├── cli.py # CLI entry point (565 lines)
└── main.py # Pipeline orchestrator
Plus hermes-web on the other end, with:
- Meal plan management (weekly slots per day)
- Public grocery list with swipe gestures, store organization, shareable tokens
- Price history tracking with sale/regular price comparison
Step 1: Scraping — httpx + BeautifulSoup
The scraper was rewritten twice. The first version used Playwright (a headless browser) to render circular pages. It crashed constantly with EPIPE errors and took 15+ minutes per store.
The second version uses httpx + BeautifulSoup — plain HTTP requests with HTML parsing. No JavaScript required, because circular image URLs are embedded directly in the HTML.
async def scrape_store(self, store: StoreConfig) -> List[FlyerPage]:
# Fetch the main circular page
resp = await client.get(store.url)
soup = BeautifulSoup(resp.text, "html.parser")
# Extract all flyer page links (14-16 pages per store)
flyer_links = [
a["href"] for a in soup.select("a[href*=flyer]")
]
# Download each page's image
for link in flyer_links:
page_resp = await client.get(link)
page_soup = BeautifulSoup(page_resp.text, "html.parser")
img_url = page_soup.select_one("img.flyer-page")["src"]
await download_image(img_url)
Result: 3x faster, zero crashes, from 15 minutes to 4-5 minutes for all three stores.
Step 2: Vision — Gemini 2.5 Flash on 44 Images
Each store publishes 14-16 circular pages, totaling ~44 images per week. Each image contains 15-30 grocery items with prices, promotions, and unit pricing.
The prompt engineering evolved significantly:
PROMPT = """
Extract grocery items from this Quebec grocery circular image.
For each item, return:
- name (French, exact from the image)
- category (meat|fish|fruits|vegetables|dairy|pantry|frozen)
- sale_price, regular_price
- unit_price, unit_type (kg, lb, barquette, ml, sac, botte, etc.)
- weight_or_volume if visible
- is_member_price (is this a loyalty-card-only price?)
- store_brand if it's a store brand
Rules:
- Prices are in CAD
- TPS=5%, TVQ=9.975% for reference
- Use natural packaging units, NOT always $/kg
- e.g. yogourt = $/barquette, lait = $/sac or $/L
"""
The first attempt used generic prompts and got hallucinated prices — items that don't exist, prices that don't match. The key improvements:
- Category filtering: Only extract fruits, vegetables, meat/fish, and dairy (the categories we actually cook with)
- Real price cross-referencing: Prices from Gemini are marked as
price_is_estimated=trueand overwritten with actual DB prices when available - Natural units: No "forcing everything to $/kg" — 473ml stays 473ml
- Weight extraction: Parse weight/volume from item names for unit price calculation
Each store run consumes ~6,000 tokens in, ~20,000 tokens out via Gemini 2.5 Flash — costing about $0.03 per week.
Step 3: Menu Generation — Ricardo × Weissman
The menu generator is the most complex module (485 lines). It takes sale items, a cooking profile, and constraints, and generates seven days of dinners.
The cooking profile defines the culinary identity:
STYLE CULINAIRE : Ricardo × Joshua Weissman
Ricardo (approachable, family-friendly, structured, Quebecois pantry)
meets Joshua Weissman (high-energy, technique-driven, "better than takeout").
Chef inspirations:
— Sam the Cooking Guy: casual restaurant-quality comfort food
— Jean-Francois Plante: bistro quebecois, sauces et braises
— J. Kenji Lopez-Alt: culinary efficiency, 30 min, practical techniques
— Genevieve O'Gleman / Savourer: structured, healthy, ultra-efficient
— Brian Lagerstrom: clean, precise, weeknight elite under 45 min
— Nagi Maehashi / RecipeTin Eats: foolproof, asian/mex/tray-bakes
Keywords: weeknight-efficient, high-flavor, elevated comfort food,
sale-item driven, minimum cleanup, meal-prep friendly
Family: 2 adults + 4 teenagers + leftovers for lunches
Each recipe includes:
- Ingredients with exact quantities
- Step-by-step instructions with prep/cook times
- Pricing: total cost + per-portion cost
- Nutrition: calories, protein, carbs, fat
- Store mapping: which items come from which store
Dinners are designed to share ingredients across the week to minimize waste. Leftovers are deliberately oversized for next-day lunches.
Step 4: Price Intelligence — Sale vs Regular
This was the hardest part. Gemini often hallucinates sale prices ("$3.99" when the real price is "$5.49"). The solution: a two-tier price system.
def _inject_regular_prices(grocery_items, db):
"""
Cross-reference each item against the SQLite price_history table.
If we've seen this item before at a different price, mark the
Gemini-estimated price as estimated and use the real historical price.
"""
for item in grocery_items:
historical = db.get_price(item["name"], item["store"])
if historical:
item["regular_price"] = historical.regular_price
item["sale_price"] = item["sale_price"] # from circular = real
item["price_is_estimated"] = False
else:
item["price_is_estimated"] = True # Gemini gave us this
return grocery_items
Items are displayed with a badge: verified prices vs. estimated prices. Over multiple weeks, the database builds up real price data and estimates become rarer.
Unit prices use natural packaging units:
- Yoghurt: $/barquette (not $/kg)
- Milk: $/sac or $/L
- Meat: $/kg or $/lb (whichever is on the package)
- Produce: $/botte, $/sac, or $/unite
Step 5: Delivery — Push to Web Dashboard
Once the menu and grocery list are generated, they're pushed to a web dashboard via a REST API:
The web dashboard then displays everything:
- Meals tab: Week view with dinner + dessert per day
- Grocery tab: Items organized by store (color-coded), with swipe-to-toggle and swipe-to-delete
- Public grocery list: Shareable via secret token URL — no login required, mobile-first
The Cron Job
Everything runs automatically on a weekly schedule — no manual intervention needed.
The pipeline uses two flags: --first-last to only process the first and last pages of each circular (enough to see the best deals without processing all 44 images), and --categories to filter to produce, meat, fish, and dairy only.
Cost Breakdown
| Component | Cost per week | Annual | |-----------|--------------|--------| | Gemini 2.5 Flash API | ~$0.03 | ~$1.56 | | Scraper (httpx) | $0.00 | $0.00 | | SQLite (local) | $0.00 | $0.00 | | hermes-web hosting | ~$5/mo | ~$60 | | Total | | ~$62/year |
Compare this to the 45-60 minutes per week I used to spend — that's 39-52 hours saved per year. At any consulting rate, the ROI is absurd.
What I'd Do Differently
-
Skip Playwright from day one. The circular pages don't need JavaScript rendering. Plain HTTP + HTML parsing is faster, simpler, and crash-proof.
-
Add price history earlier. The first few weeks had completely hallucinated prices. Real DB cross-referencing should be the default, not an afterthought.
-
Batch Gemini calls. Each image is processed independently. Batching 3-4 pages per call would cut costs and latency.
-
Proper image preprocessing. Some circular images have poor contrast or skewed angles. A simple OpenCV pipeline (deskew, contrast stretch) would improve extraction accuracy.
What's Next
- Recipe image generation: Using Gemini Imagen 4.0 to auto-generate food photos for each recipe
- Cross-week optimization: Track what's in the pantry week-over-week to reduce waste
- Nutrition targets: Adjust menus based on family health goals (calories, protein, etc.)
- Multi-language support: The pipeline is Quebec-specific now, but the architecture generalizes to any region with circular flyers
Built with Python, Gemini, and way too many grocery flyers.