This Fundamentals module is classroom-tested from Data 6 Fall 2025.
In this module, students build a foundation in Python programming and data science, progressing from Python syntax to skills like data cleaning and array manipulation. Students apply these computational tools to data from public health contexts and analyze population estimates and health disparities. They also examine the ethical implications and potential biases in data collection. Students learn to interrogate how data is measured and its broader impact on society.
Throughout the module students will learn the following content topics:
Introduction to Computer Programming:
Python Syntax
Operators, built in functions and print()
Variables and Name Assignments/Conventions
Data Types and Data Casting
Interpreting Error Messages
Call Expressions and Functions
Introduction to Data Science:
History of Data Science as a field
Features of data sets and table attributes
Arrays and array operations
NumPy functions
Data cleaning
Measures of central tendency (mean, mode, median, range)
Categorical and Quantitative Variables
Data Sensemaking:
Asking questions about ethics and data science
Interrogating what is being measured, what was left out, who collected the data and potential sources of bias
Implications of data analysis on society through applied examples.
Applications of Data Science to Public Health:
Age-standarization
Population Estimates
Incidence vs Crude Rates
Aggregation and Disaggregation of Variables
Disparities in Health Outcomes