Course datasets

Every dataset my courses use, catalogued like a proper data portal: what it is, where it came from, what license it carries, and which course uses it. Self-hosted files download straight from this site's storage; external entries are badged so you know a click leaves the site.

Source
Format
Course

Every Bluebikes station — name, coordinates, docks, and municipality. The join partner for the trips data.

CSV 53 KB Public — license TBD DS-100

A year of Bostonians asking the city for help — every 311 service request from 2025, with type, neighborhood, and resolution timestamps.

CSV 50 MB Public — license TBD DS-100

Daily Temperatures

Self-hosted

Long-run daily temperature observations — seasonality, smoothing, and long-term trends.

CSV 390 KB Public domain (NOAA observations) DS-100

A synthetic stream of health-event records for the Epidemic Engine — the raw material for DS-551's ingestion and streaming pipelines.

CSV 1.3 MB Public — license TBD DS-551

Birth weight, gestation, and maternal health for 1,174 mother-baby pairs from the Child Health and Development Studies — the classic causality-vs-correlation dataset.

CSV 28 KB Public — license TBD DS-100

Player name, team, position, and salary for the 2015–16 NBA season — histograms with a long right tail.

CSV 18 KB Public — license TBD DS-100

Eruption durations and waiting times for the Old Faithful geyser — the classic two-cluster scatter plot.

CSV 3 KB Public — license TBD DS-100

RentSmart Boston

Self-hosted

Housing violations, complaints, and inspections for Boston rental properties — civic data with real housing-justice questions in it.

CSV 77 MB Public — license TBD DS-100

Average SAT scores and participation rates by US state — the textbook example of a lurking variable.

CSV 2 KB Public — license TBD DS-100

Highest-grossing films with unadjusted and inflation-adjusted gross — a lesson about units hiding inside a fun dataset.

CSV 11 KB Public — license TBD DS-100

US Skyscrapers

Self-hosted

Name, city, height, and completion year for notable US skyscrapers — heights, eras, and city skylines in one table.

CSV 12 KB Public — license TBD DS-100

A 2026 snapshot of the world's billionaires — name, net worth, industry, and citizenship. Great for rankings, group-bys, and skeptical questions about wealth data.

CSV 173 KB Public — license TBD DS-100

Annual world population estimates — the simplest possible time series for first plots and growth rates.

CSV 2 KB Public domain DS-100

Missing something from class? The catalog is curated — if a file you need isn't here, ask on Piazza and it'll get added.