class: center, middle, inverse, title-slide # How to Use
slice()
functions to Take Row Slices of Dataframes ## Selecting Groups and Extremes of Rows ### Peter Higgins ### 2021-01-10 --- ### How to Use the _slice()_ functions to Take Row Slices of Dataframes #### If you want the first 1:N rows, or 50:100 rows, or last 100 rows, however the dataframe is currently sorted, you can use variants of the _slice()_ function. The _slice_max()_ and _slice_min()_ can take rows at top or bottom of particular values, using N or proportion. Let's try some **slicing** examples. --- count: false Example 1/4: Slice Rows 100-200 .panel1-filter1-auto[ ```r # how many rows when you start *nrow(covid_dates) ``` ] .panel2-filter1-auto[ ``` [1] 15524 ``` ] --- count: false Example 1/4: Slice Rows 100-200 .panel1-filter1-auto[ ```r # how many rows when you start nrow(covid_dates) *covid_dates ``` ] .panel2-filter1-auto[ ``` [1] 15524 ``` ``` # A tibble: 15,524 x 18 subject_id fake_first_name fake_last_name gender pan_day test_id clinic_name <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> 1 1412 jhezane westerling female 4 covid inpatient … 2 533 penny targaryen female 7 covid clinical l… 3 9134 grunt rivers male 7 covid clinical l… 4 8518 melisandre swyft female 8 covid clinical l… 5 8967 rolley karstark male 8 covid emergency … 6 11048 megga karstark female 8 covid oncology d… 7 663 ithoke targaryen male 9 covid clinical l… 8 2158 ravella frey female 9 covid emergency … 9 3794 styr tyrell male 9 covid clinical l… 10 4706 wynafryd seaworth male 9 covid clinical l… # … with 15,514 more rows, and 11 more variables: result <chr>, # demo_group <chr>, age <dbl>, drive_thru_ind <dbl>, ct_result <dbl>, # orderset <dbl>, payor_group <chr>, patient_class <chr>, col_rec_tat <dbl>, # rec_ver_tat <dbl>, fake_date <date> ``` ] --- count: false Example 1/4: Slice Rows 100-200 .panel1-filter1-auto[ ```r # how many rows when you start nrow(covid_dates) covid_dates %>% # check if all rows distinct * slice(100:200) ``` ] .panel2-filter1-auto[ ``` [1] 15524 ``` ``` # A tibble: 101 x 18 subject_id fake_first_name fake_last_name gender pan_day test_id clinic_name <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> 1 722 joseran greyjoy male 12 covid clinical l… 2 902 owen seaworth male 12 covid emergency … 3 1840 jocelyn tarly female 12 covid clinical l… 4 2025 urswyck baelish male 12 covid ob gyn 5 2026 bellonara snow female 12 covid laboratory 6 2116 penny kettleblack female 12 covid radiation … 7 2341 penny kettleblack female 12 covid cc care nt… 8 2573 glendon lannister male 12 covid emergency … 9 2859 boros lannister male 12 covid emergency … 10 3781 randa tyrell female 12 covid clinical l… # … with 91 more rows, and 11 more variables: result <chr>, demo_group <chr>, # age <dbl>, drive_thru_ind <dbl>, ct_result <dbl>, orderset <dbl>, # payor_group <chr>, patient_class <chr>, col_rec_tat <dbl>, # rec_ver_tat <dbl>, fake_date <date> ``` ] --- count: false Example 1/4: Slice Rows 100-200 .panel1-filter1-auto[ ```r # how many rows when you start nrow(covid_dates) covid_dates %>% # check if all rows distinct slice(100:200) # see how many rows now # Format: *# slice(rownum1 : rownum2) <br> # slice(rownum1 : rownum2) <br> ``` ] .panel2-filter1-auto[ ``` [1] 15524 ``` ``` # A tibble: 101 x 18 subject_id fake_first_name fake_last_name gender pan_day test_id clinic_name <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> 1 722 joseran greyjoy male 12 covid clinical l… 2 902 owen seaworth male 12 covid emergency … 3 1840 jocelyn tarly female 12 covid clinical l… 4 2025 urswyck baelish male 12 covid ob gyn 5 2026 bellonara snow female 12 covid laboratory 6 2116 penny kettleblack female 12 covid radiation … 7 2341 penny kettleblack female 12 covid cc care nt… 8 2573 glendon lannister male 12 covid emergency … 9 2859 boros lannister male 12 covid emergency … 10 3781 randa tyrell female 12 covid clinical l… # … with 91 more rows, and 11 more variables: result <chr>, demo_group <chr>, # age <dbl>, drive_thru_ind <dbl>, ct_result <dbl>, orderset <dbl>, # payor_group <chr>, patient_class <chr>, col_rec_tat <dbl>, # rec_ver_tat <dbl>, fake_date <date> ``` ] <style> .panel1-filter1-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-filter1-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-filter1-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- count: false Example 2/4: Slice Rows with Top 20 age values .panel1-filter2-auto[ ```r # how many rows when you start *nrow(covid_dates) ``` ] .panel2-filter2-auto[ ``` [1] 15524 ``` ] --- count: false Example 2/4: Slice Rows with Top 20 age values .panel1-filter2-auto[ ```r # how many rows when you start nrow(covid_dates) *covid_dates ``` ] .panel2-filter2-auto[ ``` [1] 15524 ``` ``` # A tibble: 15,524 x 18 subject_id fake_first_name fake_last_name gender pan_day test_id clinic_name <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> 1 1412 jhezane westerling female 4 covid inpatient … 2 533 penny targaryen female 7 covid clinical l… 3 9134 grunt rivers male 7 covid clinical l… 4 8518 melisandre swyft female 8 covid clinical l… 5 8967 rolley karstark male 8 covid emergency … 6 11048 megga karstark female 8 covid oncology d… 7 663 ithoke targaryen male 9 covid clinical l… 8 2158 ravella frey female 9 covid emergency … 9 3794 styr tyrell male 9 covid clinical l… 10 4706 wynafryd seaworth male 9 covid clinical l… # … with 15,514 more rows, and 11 more variables: result <chr>, # demo_group <chr>, age <dbl>, drive_thru_ind <dbl>, ct_result <dbl>, # orderset <dbl>, payor_group <chr>, patient_class <chr>, col_rec_tat <dbl>, # rec_ver_tat <dbl>, fake_date <date> ``` ] --- count: false Example 2/4: Slice Rows with Top 20 age values .panel1-filter2-auto[ ```r # how many rows when you start nrow(covid_dates) covid_dates %>% * select(age, gender, pan_day) ``` ] .panel2-filter2-auto[ ``` [1] 15524 ``` ``` # A tibble: 15,524 x 3 age gender pan_day <dbl> <chr> <dbl> 1 0 female 4 2 0 female 7 3 0.8 male 7 4 0.8 female 8 5 0.8 male 8 6 0.8 female 8 7 0.8 male 9 8 0 female 9 9 0 male 9 10 0.9 male 9 # … with 15,514 more rows ``` ] --- count: false Example 2/4: Slice Rows with Top 20 age values .panel1-filter2-auto[ ```r # how many rows when you start nrow(covid_dates) covid_dates %>% select(age, gender, pan_day) %>% # more than 20 because of ties, auto-sorts * slice_max(age, n=20) ``` ] .panel2-filter2-auto[ ``` [1] 15524 ``` ``` # A tibble: 22 x 3 age gender pan_day <dbl> <chr> <dbl> 1 138 female 105 2 119 female 48 3 119 male 87 4 119 female 100 5 119 female 105 6 99 female 94 7 98 female 81 8 98 male 87 9 98 male 88 10 98 female 94 # … with 12 more rows ``` ] --- count: false Example 2/4: Slice Rows with Top 20 age values .panel1-filter2-auto[ ```r # how many rows when you start nrow(covid_dates) covid_dates %>% select(age, gender, pan_day) %>% # more than 20 because of ties, auto-sorts slice_max(age, n=20) # see how many rows now # Format: *# slice_max(variable, n = x) <br> # slice_max(variable, n = x) <br> ``` ] .panel2-filter2-auto[ ``` [1] 15524 ``` ``` # A tibble: 22 x 3 age gender pan_day <dbl> <chr> <dbl> 1 138 female 105 2 119 female 48 3 119 male 87 4 119 female 100 5 119 female 105 6 99 female 94 7 98 female 81 8 98 male 87 9 98 male 88 10 98 female 94 # … with 12 more rows ``` ] <style> .panel1-filter2-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-filter2-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-filter2-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- count: false Example 3/4: Slice Rows with Bottom 1% of -Earliest- Pandemic Day .panel1-filter3-auto[ ```r # how many rows when you start *nrow(covid_dates) ``` ] .panel2-filter3-auto[ ``` [1] 15524 ``` ] --- count: false Example 3/4: Slice Rows with Bottom 1% of -Earliest- Pandemic Day .panel1-filter3-auto[ ```r # how many rows when you start nrow(covid_dates) *covid_dates ``` ] .panel2-filter3-auto[ ``` [1] 15524 ``` ``` # A tibble: 15,524 x 18 subject_id fake_first_name fake_last_name gender pan_day test_id clinic_name <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> 1 1412 jhezane westerling female 4 covid inpatient … 2 533 penny targaryen female 7 covid clinical l… 3 9134 grunt rivers male 7 covid clinical l… 4 8518 melisandre swyft female 8 covid clinical l… 5 8967 rolley karstark male 8 covid emergency … 6 11048 megga karstark female 8 covid oncology d… 7 663 ithoke targaryen male 9 covid clinical l… 8 2158 ravella frey female 9 covid emergency … 9 3794 styr tyrell male 9 covid clinical l… 10 4706 wynafryd seaworth male 9 covid clinical l… # … with 15,514 more rows, and 11 more variables: result <chr>, # demo_group <chr>, age <dbl>, drive_thru_ind <dbl>, ct_result <dbl>, # orderset <dbl>, payor_group <chr>, patient_class <chr>, col_rec_tat <dbl>, # rec_ver_tat <dbl>, fake_date <date> ``` ] --- count: false Example 3/4: Slice Rows with Bottom 1% of -Earliest- Pandemic Day .panel1-filter3-auto[ ```r # how many rows when you start nrow(covid_dates) covid_dates %>% * select(age, gender, pan_day) ``` ] .panel2-filter3-auto[ ``` [1] 15524 ``` ``` # A tibble: 15,524 x 3 age gender pan_day <dbl> <chr> <dbl> 1 0 female 4 2 0 female 7 3 0.8 male 7 4 0.8 female 8 5 0.8 male 8 6 0.8 female 8 7 0.8 male 9 8 0 female 9 9 0 male 9 10 0.9 male 9 # … with 15,514 more rows ``` ] --- count: false Example 3/4: Slice Rows with Bottom 1% of -Earliest- Pandemic Day .panel1-filter3-auto[ ```r # how many rows when you start nrow(covid_dates) covid_dates %>% select(age, gender, pan_day) %>% * slice_min(pan_day, prop = 0.01) ``` ] .panel2-filter3-auto[ ``` [1] 15524 ``` ``` # A tibble: 178 x 3 age gender pan_day <dbl> <chr> <dbl> 1 0 female 4 2 0 female 7 3 0.8 male 7 4 0.8 female 8 5 0.8 male 8 6 0.8 female 8 7 0.8 male 9 8 0 female 9 9 0 male 9 10 0.9 male 9 # … with 168 more rows ``` ] --- count: false Example 3/4: Slice Rows with Bottom 1% of -Earliest- Pandemic Day .panel1-filter3-auto[ ```r # how many rows when you start nrow(covid_dates) covid_dates %>% select(age, gender, pan_day) %>% slice_min(pan_day, prop = 0.01) # see how many rows now # Format: *# slice_min(variable, prop = 0.nn) <br> # slice_min(variable, prop = 0.nn) <br> ``` ] .panel2-filter3-auto[ ``` [1] 15524 ``` ``` # A tibble: 178 x 3 age gender pan_day <dbl> <chr> <dbl> 1 0 female 4 2 0 female 7 3 0.8 male 7 4 0.8 female 8 5 0.8 male 8 6 0.8 female 8 7 0.8 male 9 8 0 female 9 9 0 male 9 10 0.9 male 9 # … with 168 more rows ``` ] <style> .panel1-filter3-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-filter3-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-filter3-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- count: false Example 4/4: Slice Top or Bottom N or % .panel1-filter4-auto[ ```r # how many rows when you start *nrow(covid_dates) ``` ] .panel2-filter4-auto[ ``` [1] 15524 ``` ] --- count: false Example 4/4: Slice Top or Bottom N or % .panel1-filter4-auto[ ```r # how many rows when you start nrow(covid_dates) *covid_dates ``` ] .panel2-filter4-auto[ ``` [1] 15524 ``` ``` # A tibble: 15,524 x 18 subject_id fake_first_name fake_last_name gender pan_day test_id clinic_name <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> 1 1412 jhezane westerling female 4 covid inpatient … 2 533 penny targaryen female 7 covid clinical l… 3 9134 grunt rivers male 7 covid clinical l… 4 8518 melisandre swyft female 8 covid clinical l… 5 8967 rolley karstark male 8 covid emergency … 6 11048 megga karstark female 8 covid oncology d… 7 663 ithoke targaryen male 9 covid clinical l… 8 2158 ravella frey female 9 covid emergency … 9 3794 styr tyrell male 9 covid clinical l… 10 4706 wynafryd seaworth male 9 covid clinical l… # … with 15,514 more rows, and 11 more variables: result <chr>, # demo_group <chr>, age <dbl>, drive_thru_ind <dbl>, ct_result <dbl>, # orderset <dbl>, payor_group <chr>, patient_class <chr>, col_rec_tat <dbl>, # rec_ver_tat <dbl>, fake_date <date> ``` ] --- count: false Example 4/4: Slice Top or Bottom N or % .panel1-filter4-auto[ ```r # how many rows when you start nrow(covid_dates) covid_dates %>% * select(age, gender, pan_day) ``` ] .panel2-filter4-auto[ ``` [1] 15524 ``` ``` # A tibble: 15,524 x 3 age gender pan_day <dbl> <chr> <dbl> 1 0 female 4 2 0 female 7 3 0.8 male 7 4 0.8 female 8 5 0.8 male 8 6 0.8 female 8 7 0.8 male 9 8 0 female 9 9 0 male 9 10 0.9 male 9 # … with 15,514 more rows ``` ] --- count: false Example 4/4: Slice Top or Bottom N or % .panel1-filter4-auto[ ```r # how many rows when you start nrow(covid_dates) covid_dates %>% select(age, gender, pan_day) %>% * slice_tail(prop = 0.01) ``` ] .panel2-filter4-auto[ ``` [1] 15524 ``` ``` # A tibble: 155 x 3 age gender pan_day <dbl> <chr> <dbl> 1 40 female 106 2 17 female 106 3 17 female 106 4 17 male 106 5 0.8 male 106 6 0.7 male 106 7 17 female 106 8 7 male 106 9 17 male 106 10 71 male 106 # … with 145 more rows ``` ] --- count: false Example 4/4: Slice Top or Bottom N or % .panel1-filter4-auto[ ```r # how many rows when you start nrow(covid_dates) covid_dates %>% select(age, gender, pan_day) %>% slice_tail(prop = 0.01) # see how many rows now # Format: *# slice_min(variable, prop = 0.nn) <br> # slice_min(variable, prop = 0.nn) <br> ``` ] .panel2-filter4-auto[ ``` [1] 15524 ``` ``` # A tibble: 155 x 3 age gender pan_day <dbl> <chr> <dbl> 1 40 female 106 2 17 female 106 3 17 female 106 4 17 male 106 5 0.8 male 106 6 0.7 male 106 7 17 female 106 8 7 male 106 9 17 male 106 10 71 male 106 # … with 145 more rows ``` ] <style> .panel1-filter4-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-filter4-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-filter4-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- class: inverse, center # End of This Flipbook ## On to The Coding Exercises!