class: center, middle, inverse, title-slide # Week 7: EDA 2: covariation ## PUBPOL 750 Data Analysis for Public Policy I ### Justin Savoie ### MPP-DS McMaster ### 2022-06-24 --- class: inverse, center, middle # EDA 2: covariation --- # A lot of quantitative social science consists of three things: - Variation (distribution, mean, sd, skew of variable e.g. Age) - Covariation (how does Age relate to Income) - Covariation, considering some other variables (how does age relate to income considering that the relation is different for different social groups) - in addition (of course) to understanding your problem, and cleaning your data --- # A lot of quantitative social science consists of three things: - Variation (distribution, mean, sd, skew of variable e.g. Age) - **Covariation (how does Age relate to Income)** - Covariation, considering some other variables (how does age relate to income considering that the relation is different for different social groups) - in addition (of course) to understanding your problem, and cleaning your data --- # Covariation - A categorical and continuous variable (e.g. ed. degree and income) - Two categorical variables (e.g. ed. degree and vote intention) - Two continuous variables (e.g. age and income) - Ambiguous cases (e.g. schooling and life satisfaction - schooling could be degree or year in school, life satisfaction could be categorical, could be a scale from 0 to 100) - In covariation, generally, there's a dependent variable (we want to explain / understand / predict) and independent variable (we want to use it to explain / understand / predict) --- # Example in The Conversation --- <img src="images/image1.png" width="90%" /> --- <img src="images/image2.png" width="60%" /> --- # Covariation? - Women who saw an “enhanced compassion” videotape rated the physician as warmer and more caring, sensitive, and compassionate than did women who watched the “standard” videotape. - first variable is categorical: those that got the “enhanced compassion” videotape and those that didn't - second variable is (probably) continuous e.g. on a scale of 0-10, rate the physician's compassion - and then you can study the distribution of the second variable, the mean, the sd, how it looks, by group - you could also study "heterogeneity": e.g. is the treatment more effective among young/older women / racialized/white women --- # Covariation: what's the objective? - description (e.g. any poll you usually see in the news) - explanation (causal) (e.g. the example above) - prediction (e.g. some gdp forecasting / machine learning for spam) - and, of course, this is not always clear - **some add "exploration"** - some add "prescription" --- class: inverse, center, middle # Code along for exercices