class: center, middle, inverse, title-slide # Week 3: Data transformation ## PUBPOL 750 Data Analysis for Public Policy I ### Justin Savoie ### MPP-DS McMaster ### 2022-05-27 --- class: inverse, center, middle # Offset in weeks ### Week 3 in content becomes week 4 in calendar ### Week 4 in content becomes week 5 in calendar etc. ### HW1 due June 3; Project 1 due July 8 ### Project 2: no change, due August 12 --- class: inverse, center, middle # Summary Week 2 Data Viz --- ```r ggplot(mpg) + geom_point(aes(x=displ,y=hwy, color=class)) ``` <!-- --> --- ```r ggplot(mpg) + geom_point(aes(x=displ,y=hwy, color=factor(year))) + facet_wrap(~class) + coord_polar() ``` <!-- --> --- ```r library(maps) world <- map_data("world") world$type <- "No" world$type <- ifelse(world$region %in% c("China","France","Russia","UK","USA"),"Perm.",world$type) world$type <- ifelse(world$region %in% c("Albania","Brazil","Gabon","Ghana","India","Ireland","Kenya","Mexico","Norway","United Arab Emirates"),"Non.perm",world$type) ggplot() + geom_polygon(data = world, aes(x=long, y = lat, group = group, fill=type)) + labs(fill="UN Security Council") + coord_fixed(1.3) + theme(legend.position = "top", plot.margin = margin(-5,1,1.5,1.2, "cm")) ``` <img src="Slides_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> --- class: inverse, center, middle # Basics --- ### Objects ```r x <- c(1,2,3) x ``` ``` ## [1] 1 2 3 ``` ```r x <- "my name is justin" x ``` ``` ## [1] "my name is justin" ``` ```r z <- sum(c(1,2,3))*2 z ``` ``` ## [1] 12 ``` ```r x1 <- c(2,4,6) x2 <- c(4,5,1) x12 <- x1+x2 x12 ``` ``` ## [1] 6 9 7 ``` --- ### Functions ```r my_vector <- c(1,2,3,4,5) mean(my_vector) ``` ``` ## [1] 3 ``` ```r rnorm(10,mean=100,sd=25) ``` ``` ## [1] 90.10275 96.39162 107.58930 98.40490 78.81469 143.16369 116.23539 ## [8] 103.00779 127.63424 124.21096 ``` ```r seq(from=0,to=10,by=2) ``` ``` ## [1] 0 2 4 6 8 10 ``` ```r seq(0,10,2) ``` ``` ## [1] 0 2 4 6 8 10 ``` --- class: inverse, center, middle # Data transformation --- ### Five key verbs `filter()` `arrange()` `select()` `mutate()` `summarise()` can be used with `groub_by()` which applies the verbs, but "by group" --- ### Let's look at the flights data frame ```r library(nycflights13) flights ``` ``` ## # A tibble: 336,776 × 19 ## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time ## <int> <int> <int> <int> <int> <dbl> <int> <int> ## 1 2013 1 1 517 515 2 830 819 ## 2 2013 1 1 533 529 4 850 830 ## 3 2013 1 1 542 540 2 923 850 ## 4 2013 1 1 544 545 -1 1004 1022 ## 5 2013 1 1 554 600 -6 812 837 ## 6 2013 1 1 554 558 -4 740 728 ## 7 2013 1 1 555 600 -5 913 854 ## 8 2013 1 1 557 600 -3 709 723 ## 9 2013 1 1 557 600 -3 838 846 ## 10 2013 1 1 558 600 -2 753 745 ## # … with 336,766 more rows, and 11 more variables: arr_delay <dbl>, ## # carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, dest <chr>, ## # air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm> ``` --- ### filter() ```r filter(flights,month==1,day==1) ``` ``` ## # A tibble: 842 × 19 ## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time ## <int> <int> <int> <int> <int> <dbl> <int> <int> ## 1 2013 1 1 517 515 2 830 819 ## 2 2013 1 1 533 529 4 850 830 ## 3 2013 1 1 542 540 2 923 850 ## 4 2013 1 1 544 545 -1 1004 1022 ## 5 2013 1 1 554 600 -6 812 837 ## 6 2013 1 1 554 558 -4 740 728 ## 7 2013 1 1 555 600 -5 913 854 ## 8 2013 1 1 557 600 -3 709 723 ## 9 2013 1 1 557 600 -3 838 846 ## 10 2013 1 1 558 600 -2 753 745 ## # … with 832 more rows, and 11 more variables: arr_delay <dbl>, carrier <chr>, ## # flight <int>, tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, ## # distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm> ``` --- ### filter() ```r filtered_flights <- filter(flights,month==1,day==1) filtered_flights ``` ``` ## # A tibble: 842 × 19 ## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time ## <int> <int> <int> <int> <int> <dbl> <int> <int> ## 1 2013 1 1 517 515 2 830 819 ## 2 2013 1 1 533 529 4 850 830 ## 3 2013 1 1 542 540 2 923 850 ## 4 2013 1 1 544 545 -1 1004 1022 ## 5 2013 1 1 554 600 -6 812 837 ## 6 2013 1 1 554 558 -4 740 728 ## 7 2013 1 1 555 600 -5 913 854 ## 8 2013 1 1 557 600 -3 709 723 ## 9 2013 1 1 557 600 -3 838 846 ## 10 2013 1 1 558 600 -2 753 745 ## # … with 832 more rows, and 11 more variables: arr_delay <dbl>, carrier <chr>, ## # flight <int>, tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, ## # distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm> ``` --- ### filter() ```r filter(flights,arr_time>2330) ``` ``` ## # A tibble: 6,196 × 19 ## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time ## <int> <int> <int> <int> <int> <dbl> <int> <int> ## 1 2013 1 1 1843 1835 8 2339 2346 ## 2 2013 1 1 1846 1855 -9 2336 2355 ## 3 2013 1 1 1952 1930 22 2358 2207 ## 4 2013 1 1 1959 1930 29 2331 2306 ## 5 2013 1 1 2021 2025 -4 2351 2341 ## 6 2013 1 1 2025 2030 -5 2334 2348 ## 7 2013 1 1 2025 2028 -3 2358 2351 ## 8 2013 1 1 2030 2035 -5 2354 2342 ## 9 2013 1 1 2031 2030 1 2344 2335 ## 10 2013 1 1 2035 2030 5 2337 5 ## # … with 6,186 more rows, and 11 more variables: arr_delay <dbl>, ## # carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, dest <chr>, ## # air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm> ``` --- ### filter() ```r filter(flights,arr_time %in% c(2200,2300,2400)) ``` ``` ## # A tibble: 794 × 19 ## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time ## <int> <int> <int> <int> <int> <dbl> <int> <int> ## 1 2013 1 1 1912 1915 -3 2200 2219 ## 2 2013 1 1 2209 2155 14 2400 2337 ## 3 2013 1 2 1918 1925 -7 2200 2205 ## 4 2013 1 3 1857 1900 -3 2200 2301 ## 5 2013 1 3 1941 1946 -5 2200 2154 ## 6 2013 1 3 1946 1930 16 2300 2247 ## 7 2013 1 4 1859 1900 -1 2200 2238 ## 8 2013 1 5 2116 2130 -14 2400 18 ## 9 2013 1 6 1950 2000 -10 2200 2145 ## 10 2013 1 6 2048 2040 8 2200 2149 ## # … with 784 more rows, and 11 more variables: arr_delay <dbl>, carrier <chr>, ## # flight <int>, tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, ## # distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm> ``` --- ### filter() ```r filter(flights,is.na(arr_time)) ``` ``` ## # A tibble: 8,713 × 19 ## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time ## <int> <int> <int> <int> <int> <dbl> <int> <int> ## 1 2013 1 1 2016 1930 46 NA 2220 ## 2 2013 1 1 NA 1630 NA NA 1815 ## 3 2013 1 1 NA 1935 NA NA 2240 ## 4 2013 1 1 NA 1500 NA NA 1825 ## 5 2013 1 1 NA 600 NA NA 901 ## 6 2013 1 2 2041 2045 -4 NA 2359 ## 7 2013 1 2 2145 2129 16 NA 33 ## 8 2013 1 2 NA 1540 NA NA 1747 ## 9 2013 1 2 NA 1620 NA NA 1746 ## 10 2013 1 2 NA 1355 NA NA 1459 ## # … with 8,703 more rows, and 11 more variables: arr_delay <dbl>, ## # carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, dest <chr>, ## # air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm> ``` --- ### arrange() ```r head(arrange(flights,arr_time)) ``` ``` ## # A tibble: 6 × 19 ## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time ## <int> <int> <int> <int> <int> <dbl> <int> <int> ## 1 2013 1 2 2130 2130 0 1 18 ## 2 2013 1 11 2157 2000 117 1 2208 ## 3 2013 1 11 2253 2249 4 1 2357 ## 4 2013 1 14 2122 2130 -8 1 2 ## 5 2013 1 14 2246 2250 -4 1 7 ## 6 2013 1 15 2304 2245 19 1 2357 ## # … with 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>, ## # tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, ## # hour <dbl>, minute <dbl>, time_hour <dttm> ``` --- ```r head(arrange(flights,desc(arr_time))) ``` ``` ## # A tibble: 6 × 19 ## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time ## <int> <int> <int> <int> <int> <dbl> <int> <int> ## 1 2013 1 1 2209 2155 14 2400 2337 ## 2 2013 1 5 2116 2130 -14 2400 18 ## 3 2013 1 13 2243 2129 74 2400 2224 ## 4 2013 1 16 2138 2107 31 2400 2322 ## 5 2013 1 17 2256 2249 7 2400 2357 ## 6 2013 1 22 2212 2055 77 2400 2250 ## # … with 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>, ## # tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, ## # hour <dbl>, minute <dbl>, time_hour <dttm> ``` --- ```r head(arrange(flights,month,desc(day))) ``` ``` ## # A tibble: 6 × 19 ## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time ## <int> <int> <int> <int> <int> <dbl> <int> <int> ## 1 2013 1 31 1 2100 181 124 2225 ## 2 2013 1 31 4 2359 5 455 444 ## 3 2013 1 31 7 2359 8 453 437 ## 4 2013 1 31 12 2250 82 132 7 ## 5 2013 1 31 26 2154 152 328 50 ## 6 2013 1 31 34 2159 155 135 2315 ## # … with 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>, ## # tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, ## # hour <dbl>, minute <dbl>, time_hour <dttm> ``` --- ### select() ```r select(flights,month,day) ``` ``` ## # A tibble: 336,776 × 2 ## month day ## <int> <int> ## 1 1 1 ## 2 1 1 ## 3 1 1 ## 4 1 1 ## 5 1 1 ## 6 1 1 ## 7 1 1 ## 8 1 1 ## 9 1 1 ## 10 1 1 ## # … with 336,766 more rows ``` --- ### select() ```r select(flights,-month,-day) ``` ``` ## # A tibble: 336,776 × 17 ## year dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay ## <int> <int> <int> <dbl> <int> <int> <dbl> ## 1 2013 517 515 2 830 819 11 ## 2 2013 533 529 4 850 830 20 ## 3 2013 542 540 2 923 850 33 ## 4 2013 544 545 -1 1004 1022 -18 ## 5 2013 554 600 -6 812 837 -25 ## 6 2013 554 558 -4 740 728 12 ## 7 2013 555 600 -5 913 854 19 ## 8 2013 557 600 -3 709 723 -14 ## 9 2013 557 600 -3 838 846 -8 ## 10 2013 558 600 -2 753 745 8 ## # … with 336,766 more rows, and 10 more variables: carrier <chr>, flight <int>, ## # tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, ## # hour <dbl>, minute <dbl>, time_hour <dttm> ``` --- ### select() ```r select(flights,contains('time')) ``` ``` ## # A tibble: 336,776 × 6 ## dep_time sched_dep_time arr_time sched_arr_time air_time time_hour ## <int> <int> <int> <int> <dbl> <dttm> ## 1 517 515 830 819 227 2013-01-01 05:00:00 ## 2 533 529 850 830 227 2013-01-01 05:00:00 ## 3 542 540 923 850 160 2013-01-01 05:00:00 ## 4 544 545 1004 1022 183 2013-01-01 05:00:00 ## 5 554 600 812 837 116 2013-01-01 06:00:00 ## 6 554 558 740 728 150 2013-01-01 05:00:00 ## 7 555 600 913 854 158 2013-01-01 06:00:00 ## 8 557 600 709 723 53 2013-01-01 06:00:00 ## 9 557 600 838 846 140 2013-01-01 06:00:00 ## 10 558 600 753 745 138 2013-01-01 06:00:00 ## # … with 336,766 more rows ``` --- ### mutate() ```r mutate(mtcars,one=1) ``` ``` ## mpg cyl disp hp drat wt qsec vs am gear carb one ## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 1 ## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 1 ## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 1 ## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 1 ## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 1 ## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 1 ## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 1 ## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 1 ## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 1 ## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 1 ## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 1 ## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 1 ## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 1 ## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 1 ## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 1 ## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 1 ## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 1 ## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 1 ## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 1 ## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 1 ## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 1 ## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 1 ## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 1 ## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 1 ## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 1 ## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 1 ## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 1 ## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 1 ## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 1 ## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 1 ## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 1 ## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 1 ``` --- ### mutate() ```r mutate(mtcars,hp2=hp^2) ``` ``` ## mpg cyl disp hp drat wt qsec vs am gear carb hp2 ## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 12100 ## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 12100 ## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 8649 ## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 12100 ## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 30625 ## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 11025 ## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 60025 ## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 3844 ## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 9025 ## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 15129 ## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 15129 ## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 32400 ## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 32400 ## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 32400 ## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 42025 ## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 46225 ## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 52900 ## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 4356 ## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 2704 ## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 4225 ## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 9409 ## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 22500 ## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 22500 ## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 60025 ## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 30625 ## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 4356 ## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 8281 ## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 12769 ## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 69696 ## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 30625 ## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 112225 ## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 11881 ``` --- ### mutate() ```r mutate(mtcars,hpbycyl=hp/cyl) ``` ``` ## mpg cyl disp hp drat wt qsec vs am gear carb ## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 ## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 ## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 ## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 ## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 ## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 ## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 ## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 ## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 ## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 ## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 ## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 ## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 ## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 ## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 ## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 ## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 ## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 ## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 ## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 ## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 ## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 ## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 ## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 ## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 ## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 ## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 ## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 ## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 ## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 ## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 ## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 ## hpbycyl ## Mazda RX4 18.33333 ## Mazda RX4 Wag 18.33333 ## Datsun 710 23.25000 ## Hornet 4 Drive 18.33333 ## Hornet Sportabout 21.87500 ## Valiant 17.50000 ## Duster 360 30.62500 ## Merc 240D 15.50000 ## Merc 230 23.75000 ## Merc 280 20.50000 ## Merc 280C 20.50000 ## Merc 450SE 22.50000 ## Merc 450SL 22.50000 ## Merc 450SLC 22.50000 ## Cadillac Fleetwood 25.62500 ## Lincoln Continental 26.87500 ## Chrysler Imperial 28.75000 ## Fiat 128 16.50000 ## Honda Civic 13.00000 ## Toyota Corolla 16.25000 ## Toyota Corona 24.25000 ## Dodge Challenger 18.75000 ## AMC Javelin 18.75000 ## Camaro Z28 30.62500 ## Pontiac Firebird 21.87500 ## Fiat X1-9 16.50000 ## Porsche 914-2 22.75000 ## Lotus Europa 28.25000 ## Ford Pantera L 33.00000 ## Ferrari Dino 29.16667 ## Maserati Bora 41.87500 ## Volvo 142E 27.25000 ``` --- ### group_by() with mutate() ```r mtcars_grouped <- group_by(mtcars,cyl) mutate(mtcars_grouped,avg_mpg_cyl=mean(mpg)) ``` ``` ## # A tibble: 32 × 12 ## # Groups: cyl [3] ## mpg cyl disp hp drat wt qsec vs am gear carb avg_mpg_cyl ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 19.7 ## 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 19.7 ## 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 26.7 ## 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 19.7 ## 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 15.1 ## 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 19.7 ## 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 15.1 ## 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 26.7 ## 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 26.7 ## 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 19.7 ## # … with 22 more rows ``` --- ### group_by() with summarise() ```r mtcars_grouped <- group_by(mtcars,cyl) summarise(mtcars_grouped,avg_mpg_cyl=mean(mpg)) ``` ``` ## # A tibble: 3 × 2 ## cyl avg_mpg_cyl ## <dbl> <dbl> ## 1 4 26.7 ## 2 6 19.7 ## 3 8 15.1 ``` --- class: inverse, center, middle # Homework 1 --- class: inverse, center, middle # Exercices ### 5.2.4 5.3.1 5.4.1 5.5.2 5.6.7 5.7.1