DS-551 · Data Engineering at Scale
Build and operate the systems that move data at production scale. A project-based course organized around the Epidemic Engine — pipelines, streams, containers, and the trade-offs that come with real infrastructure.
Catalog
The live course hub: what I am teaching next, what I oversee, what I am developing, and the older courses students still look for. Past course pages stay stable; assignments and weekly deadlines still live in Blackboard.
Active / upcoming
Courses with a current or upcoming offering. Teaching means I am the instructor; overseeing means I support the course without being the primary instructor.
Build and operate the systems that move data at production scale. A project-based course organized around the Epidemic Engine — pipelines, streams, containers, and the trade-offs that come with real infrastructure.
An introduction to reasoning with data — Python, statistics, and visualization used to make arguments about the real world that are harder to dismiss than words. No prior programming experience expected.
I support this course as part of the Spark/CDS teaching ecosystem rather than serving as the primary instructor.
A practical bridge from classroom computing to internships and early technical work: GitHub portfolios, technical communication, code review, agile habits, resumes, interviews, and workplace onboarding.
I support this course as part of the Spark/CDS teaching ecosystem rather than serving as the primary instructor.
The Spark software practicum: student teams build production-ready software for real partners, using agile process, GitHub workflows, code review, testing, demos, and clear stakeholder communication.
I support this course as part of the Spark/CDS teaching ecosystem rather than serving as the primary instructor.
Applied machine-learning practicum: student teams turn a partner problem into a dataset, a model, an evaluation plan, and a result they can explain without hiding behind the math.
I support this course as part of the Spark/CDS teaching ecosystem rather than serving as the primary instructor.
A Spark practicum in product and design work: user research, prototyping, critique, partner conversations, and the choices that make an interface usable by someone other than the team that built it.
I support this course as part of the Spark/CDS teaching ecosystem rather than serving as the primary instructor.
In development
These are authored course ideas with real materials or pilots behind them. Some have run as directed studies, where students helped shape the course while taking it.
A proposed advanced course on using language models in data-science work: prompt design, RAG, AI-assisted analysis, code generation, validation, attribution, and responsible workflow design.
Interested students can ask about directed-study versions, help shape the course, or tell me what would make it useful.
A future journalism-facing Python course focused on data acquisition, analysis, automation, and investigative workflows for reporting.
Interested students can ask about directed-study versions, help shape the course, or tell me what would make it useful.
A future offering that teaches core algorithmic ideas through journalism and public-interest investigation: data structures, complexity, computational reasoning, and evidence work.
Interested students can ask about directed-study versions, help shape the course, or tell me what would make it useful.
Archive
Courses I taught or co-taught directly that do not have a current role for me listed here.
A Cross-College Challenge practicum where students work with journalists, advocates, and researchers on justice-focused data stories, investigations, visualizations, and tools.
A retired civic-tech practicum where students used data science and software engineering for public-interest projects with nonprofit, government, and academic partners.
Supported history
Historical course pages where my role was curriculum or program support rather than primary teaching.
An early software-engineering immersion course for students getting their hands on the parts of a modern app stack: React, FastAPI, containers, deployment, collaboration, and technical demos.
I support this course as part of the Spark/CDS teaching ecosystem rather than serving as the primary instructor.