← All courses
Coming up I oversee this course Authored · Fall 2026

DS-100 · Data Speak Louder than Words

An introduction to reasoning with data — Python, statistics, and visualization used to make arguments about the real world that are harder to dismiss than words. No prior programming experience expected.

Part of a trail on langd0n.com:ai attribution →

DS-100 is where a lot of people discover that they can do this. It’s an introductory data science course with no programming prerequisite, built around one conviction: data speak louder than words, and learning to listen to them — carefully, skeptically, honestly — is a skill worth having no matter what you study.

What the course is really about

We work with three kinds of thinking at once:

  • Critical thinking — is this claim actually supported by the data behind it?
  • Inferential thinking — what can (and can’t) we conclude from a sample?
  • Computational thinking — how do we make a computer do the tedious parts?

You’ll learn Python from scratch, work in Jupyter notebooks from day one, and practice on real, messy, socially relevant data — public health, housing, transportation, education. By the end you’ll be making and testing hypotheses, building visualizations that tell the truth, and reaching conclusions you can defend out loud.

How it runs

The course uses a flipped, GenAI-integrated model. Each week you explore the upcoming concepts with an AI learning partner (a structured GenAI Exploration, or GAIE), then class time goes to the hard parts: verification, application, and the questions the AI couldn’t answer well. Assessments are individual and mostly AI-free — the explorations are practice; the verification is yours alone.

If you’ve heard that this course “uses AI,” that’s true, but the point isn’t the AI. The point is that you leave able to do the work yourself and to say, precisely and honestly, where a machine helped.

Who it’s for

Anyone. DS-100 is a BU Hub course (Social Inquiry, Digital/Multimedia Expression, Research and Information Literacy) designed for students from any major. If you’ve never written a line of code, you’re exactly who it was written for.

Course materials

Reference documents for the course — read online or download. Assignments and weekly schedules live in the LMS, not here.

Datasets we use

Materials here are from Summer 2026, the most recent taught term. Browse the full catalog →

  • Hollywood Actors — Box Office Self-hosted

    Top 50 actors by total US box-office gross, with per-film averages and their biggest movie.

  • Maternal Smoking & Birth Weight Self-hosted

    Birth weight, gestation, and maternal health for 1,174 mother-baby pairs from the Child Health and Development Studies — the classic causality-vs-correlation dataset.

  • World Billionaires (2026) Self-hosted

    A 2026 snapshot of the world's billionaires — name, net worth, industry, and citizenship. Great for rankings, group-bys, and skeptical questions about wealth data.

  • Bluebikes Trips — Sample (Sept 2021) Self-hosted

    A workable sample of Boston Bluebikes bike-share trips from September 2021 — start/end stations, timestamps, and rider type.

  • Bluebikes Stations (May 2026) Self-hosted

    Every Bluebikes station — name, coordinates, docks, and municipality. The join partner for the trips data.

  • Boston 311 Service Requests (2025) Self-hosted

    A year of Bostonians asking the city for help — every 311 service request from 2025, with type, neighborhood, and resolution timestamps.

  • Boston 311 Requests (live portal) Analyze Boston

    The live, continuously updated 311 dataset on Analyze Boston — for when the 2025 snapshot isn't enough.

  • Boston Building Energy & Water Metrics (2025) Self-hosted

    Energy and water use reported by Boston's large buildings under BERDO — sustainability data with policy teeth.

  • NBA Salaries (2015–16) Self-hosted

    Player name, team, position, and salary for the 2015–16 NBA season — histograms with a long right tail.

  • Old Faithful Eruptions Self-hosted

    Eruption durations and waiting times for the Old Faithful geyser — the classic two-cluster scatter plot.

  • US Presidential Birth Years Self-hosted

    Birth data for US presidents — a tiny table for early table operations and date arithmetic.

  • RentSmart Boston Self-hosted

    Housing violations, complaints, and inspections for Boston rental properties — civic data with real housing-justice questions in it.

  • State SAT Averages (2014) Self-hosted

    Average SAT scores and participation rates by US state — the textbook example of a lurking variable.

  • San Francisco City Salaries (2015) Self-hosted

    Compensation for every San Francisco city employee in 2015 — job titles, salaries, overtime, and benefits.

  • US Skyscrapers Self-hosted

    Name, city, height, and completion year for notable US skyscrapers — heights, eras, and city skylines in one table.

  • Daily Temperatures Self-hosted

    Long-run daily temperature observations — seasonality, smoothing, and long-term trends.

  • Top Grossing Movies (2017) Self-hosted

    Highest-grossing films with unadjusted and inflation-adjusted gross — a lesson about units hiding inside a fun dataset.

  • United Flight Delays (Summer 2015) Self-hosted

    Departure delays for United flights out of SFO — thousands of rows for sampling and the law of averages.

  • World Population by Year Self-hosted

    Annual world population estimates — the simplest possible time series for first plots and growth rates.

From the classroom

DS-100 students and course staff posing together at the end of the Spring 2022 semester
DS-100, Spring 2022 — we made it!

Planned terms

Past terms