DS-551 · Data Engineering at Scale
Build and operate the systems that move data at production scale. A project-based course organized around the Epidemic Engine — pipelines, streams, containers, and the trade-offs that come with real infrastructure.
hi, welcome in —
I'm Langdon. I teach data science and computing at Boston University. This site is the durable home for syllabi, course datasets, policies, and the practical stuff — how to ask for a reference letter, what to do when life gets in the way, and where to find help.
Looking for an assignment deadline? Those live in Blackboard — always.



Build and operate the systems that move data at production scale. A project-based course organized around the Epidemic Engine — pipelines, streams, containers, and the trade-offs that come with real infrastructure.
The data catalog
Real data is the heart of these courses — Bluebikes trips, Boston 311 requests, baby names, billionaires. The catalog tells you what each dataset is, where it came from, what license it carries, and which course uses it. Self-hosted files download directly; external ones are clearly badged before you click.
Browse 20 datasetswhy a data catalog?
Because "where's the CSV from lecture?" shouldn't be a scavenger hunt. Course sites almost never have one — data catalogs like Kaggle and Analyze Boston do. So this site works like the latter: browse, filter, download, cite.
Reference letters, incomplete grades, accommodations, the AI-use policy, and answers to the questions students actually ask — the Students page has all of it. No hunting through old emails required.