If you want to terrify a college student, show them the enterprise pricing page for a cloud data warehouse. Snowflake is brilliant, but it is definitely not built with a student budget in mind. Luckily, they offer a massive saving grace: a 30-day free trial packed with $400 in credits. That is plenty of runway to spin up virtual warehouses, run complex queries, and learn enterprise-grade data tools without having to live on instant noodles for the rest of the semester.
Naturally, I couldn't just let those free credits sit there. I needed a dataset chaotic enough to put Snowflake through its paces.
Enter the Toronto Transit Commission (TTC). Anyone who has ever waited for a bus in Toronto knows that vehicle arrival times feel more like a polite suggestion than a strict schedule. So, I built the TTC Transit Reliability Monitor — an end-to-end data pipeline designed to track live vehicle reporting habits across the city.
The Pipeline Architecture
I wanted to build a real production-style workflow, so I avoided manual uploads completely. The pipeline runs automatically through a clean, multi-layered stack:
Ingestion: Every 15 minutes, an Apache Airflow DAG running locally via Docker Compose pings the public TTC live location feed. It grabs raw JSON payloads for over 218 routes and streams them straight into Snowflake.
Transformation: Once the raw data lands in Snowflake, dbt takes the wheel. It cleans up the types, deduplicates records, and processes everything through staging, intermediate, and mart layers.
Data Quality: To make sure the data wasn't completely hallucinating, I wrote 44 automated tests using pytest and built-in dbt checks. If a route ID goes missing or timestamps look weird, the system catches it immediately.
The Dashboard: Finally, a public Streamlit dashboard reads from the optimized dbt data marts, serving up live heatmaps of report delays and a leaderboard of the most reliable routes.
The Financial Verdict
After crunching over 73,000 distinct vehicle observations across thousands of Toronto transit vehicles, my Snowflake compute bill is still just a tiny drop in the bucket of that free $400 credit.
The big takeaway? Learning enterprise data engineering doesn't require a corporate budget or a massive credit card limit. You just need a solid free trial, a messy public API, and a bit of curiosity to see what you can build before the credits run out.
If you want to see the system in action or check how delayed your favorite Toronto bus route is right now, you can explore the Live Dashboard.
focused