Demo with Tabsets (Panelsets)

class: center, middle, inverse, title-slide

# Demo with Tabsets (Panelsets)
## Selecting Rows with _slice_sample()
### Peter Higgins
### 2021-01-15

---

### How to Use the _slice()_ functions to Take Slices of Rows

####

If you have a very large dataset, and want to develop code on a smaller (but random) sample,
_slice_sample()_ can help.

This is also helpful for sampling for training and testing sets when modeling.

_slice_sample()_ can take n or proportion (prop) arguments

Let's see some **sampling** examples!

---
.panelset[
.panel[.panel-name[R Code]

```r
# how many rows when you start
nrow(covid_dates)

covid_dates %>% 
  slice_sample(prop = 0.3)

# see how many rows now

# Format:
#   slice_sample(prop = 0.nn) 
```
]

.panel[.panel-name[Results]

```
[1] 15524
```

```
# A tibble: 4,657 x 18
   subject_id fake_first_name fake_last_name gender pan_day test_id clinic_name
        <dbl> <chr>           <chr>          <chr>    <dbl> <chr>   <chr>      
 1      10396 melisandre      lannister      female      72 covid   emergency …
 2       5050 gerold          westerling     male        31 covid   clinical l…
 3       3713 ardrian         tully          male        31 covid   cc care nt…
 4       7687 amory           sand           male       104 covid   nicu       
 5       1245 arwyn           targaryen      female      75 covid   clinical l…
 6        293 gilly           westerling     female     105 covid   radiation …
 7      10262 elinor          targaryen      female     106 covid   clinical l…
 8       4366 donyse          westerling     female     100 covid   clinical l…
 9       2981 humfrey         swyft          male       105 covid   inpatient …
10       7610 mudge           swyft          male        86 covid   clinical l…
# … with 4,647 more rows, and 11 more variables: result <chr>,
#   demo_group <chr>, age <dbl>, drive_thru_ind <dbl>, ct_result <dbl>,
#   orderset <dbl>, payor_group <chr>, patient_class <chr>, col_rec_tat <dbl>,
#   rec_ver_tat <dbl>, fake_date <date>
```
 ]
]

---

count: false
 
Example 2/3: Take a Random 70% Sample for Training and a Complementary 30% for Testing.
.panel1-filter2-auto[

```r
# how many rows when you start
*nrow(covid_dates)
```
]
 
.panel2-filter2-auto[

```
[1] 15524
```
]

---
count: false
 
Example 2/3: Take a Random 70% Sample for Training and a Complementary 30% for Testing.
.panel1-filter2-auto[

```r
# how many rows when you start
nrow(covid_dates)

# make training set
*covid_dates
```
]
 
.panel2-filter2-auto[

```
[1] 15524
```

```
# A tibble: 15,524 x 18
   subject_id fake_first_name fake_last_name gender pan_day test_id clinic_name
        <dbl> <chr>           <chr>          <chr>    <dbl> <chr>   <chr>      
 1       1412 jhezane         westerling     female       4 covid   inpatient …
 2        533 penny           targaryen      female       7 covid   clinical l…
 3       9134 grunt           rivers         male         7 covid   clinical l…
 4       8518 melisandre      swyft          female       8 covid   clinical l…
 5       8967 rolley          karstark       male         8 covid   emergency …
 6      11048 megga           karstark       female       8 covid   oncology d…
 7        663 ithoke          targaryen      male         9 covid   clinical l…
 8       2158 ravella         frey           female       9 covid   emergency …
 9       3794 styr            tyrell         male         9 covid   clinical l…
10       4706 wynafryd        seaworth       male         9 covid   clinical l…
# … with 15,514 more rows, and 11 more variables: result <chr>,
#   demo_group <chr>, age <dbl>, drive_thru_ind <dbl>, ct_result <dbl>,
#   orderset <dbl>, payor_group <chr>, patient_class <chr>, col_rec_tat <dbl>,
#   rec_ver_tat <dbl>, fake_date <date>
```
]

---
count: false
 
Example 2/3: Take a Random 70% Sample for Training and a Complementary 30% for Testing.
.panel1-filter2-auto[

```r
# how many rows when you start
nrow(covid_dates)

# make training set
covid_dates %>%
* slice_sample(prop = 0.7)
```
]
 
.panel2-filter2-auto[

```
[1] 15524
```

```
# A tibble: 10,866 x 18
   subject_id fake_first_name fake_last_name gender pan_day test_id clinic_name
        <dbl> <chr>           <chr>          <chr>    <dbl> <chr>   <chr>      
 1      11684 anguy           stark          male        95 covid   autopsy    
 2       3851 ryman           karstark       male        23 covid   emergency …
 3      11585 emmon           karstark       male        37 covid   clinical l…
 4       1175 ben             ryswell        male        11 covid   clinical l…
 5       3906 gyles           snow           male        33 covid   emergency …
 6       2127 edric           targaryen      male        36 covid   clinical l…
 7      10780 marissa         seaworth       female      63 covid   emergency …
 8      11895 lysa            swyft          female      50 covid   clinical l…
 9       4750 tanda           mormont        female      58 covid   intl patie…
10      11380 hallyne         clegane        male        92 covid   clinical l…
# … with 10,856 more rows, and 11 more variables: result <chr>,
#   demo_group <chr>, age <dbl>, drive_thru_ind <dbl>, ct_result <dbl>,
#   orderset <dbl>, payor_group <chr>, patient_class <chr>, col_rec_tat <dbl>,
#   rec_ver_tat <dbl>, fake_date <date>
```
]

---
count: false
 
Example 2/3: Take a Random 70% Sample for Training and a Complementary 30% for Testing.
.panel1-filter2-auto[

```r
# how many rows when you start
nrow(covid_dates)

# make training set
covid_dates %>%
  slice_sample(prop = 0.7) ->
*training_covid_dates
```
]
 
.panel2-filter2-auto[

```
[1] 15524
```
]

---
count: false
 
Example 2/3: Take a Random 70% Sample for Training and a Complementary 30% for Testing.
.panel1-filter2-auto[

```r
# how many rows when you start
nrow(covid_dates)

# make training set
covid_dates %>%
  slice_sample(prop = 0.7) ->
training_covid_dates

# now make testing set
*covid_dates
```
]
 
.panel2-filter2-auto[

```
[1] 15524
```

---
count: false
 
Example 2/3: Take a Random 70% Sample for Training and a Complementary 30% for Testing.
.panel1-filter2-auto[

```r
# how many rows when you start
nrow(covid_dates)

# make training set
covid_dates %>%
  slice_sample(prop = 0.7) ->
training_covid_dates

# now make testing set
covid_dates %>%
* anti_join(training_covid_dates)
```
]
 
.panel2-filter2-auto[

```
[1] 15524
```

```
# A tibble: 4,658 x 18
   subject_id fake_first_name fake_last_name gender pan_day test_id clinic_name
        <dbl> <chr>           <chr>          <chr>    <dbl> <chr>   <chr>      
 1       8115 patrek          frey           male         9 covid   clinical l…
 2       8943 myria           rivers         female       9 covid   picu       
 3       6965 arthor          lannister      male         9 covid   clinical l…
 4       2103 ollo            snow           male        10 covid   clinical l…
 5       4930 sarra           frey           female      10 covid   emergency …
 6       8138 frenya          swyft          female      10 covid   clinical l…
 7       2114 azzak           tully          male        10 covid   inpatient …
 8        227 maege           sand           female      11 covid   emergency …
 9        252 nymeria         karstark       female      11 covid   ob gyn     
10       1299 alys            manderly       female      11 covid   inpatient …
# … with 4,648 more rows, and 11 more variables: result <chr>,
#   demo_group <chr>, age <dbl>, drive_thru_ind <dbl>, ct_result <dbl>,
#   orderset <dbl>, payor_group <chr>, patient_class <chr>, col_rec_tat <dbl>,
#   rec_ver_tat <dbl>, fake_date <date>
```
]

---
count: false
 
Example 2/3: Take a Random 70% Sample for Training and a Complementary 30% for Testing.
.panel1-filter2-auto[

```r
# how many rows when you start
nrow(covid_dates)

# make training set
covid_dates %>%
  slice_sample(prop = 0.7) ->
training_covid_dates

# now make testing set
covid_dates %>%
  anti_join(training_covid_dates)->
*testing_covid_dates
```
]
 
.panel2-filter2-auto[

```
[1] 15524
```
]

---
count: false
 
Example 2/3: Take a Random 70% Sample for Training and a Complementary 30% for Testing.
.panel1-filter2-auto[

```r
# how many rows when you start
nrow(covid_dates)

# make training set
covid_dates %>%
  slice_sample(prop = 0.7) ->
training_covid_dates

# now make testing set
covid_dates %>%
  anti_join(training_covid_dates)->
testing_covid_dates
# see how many rows in each
*training_covid_dates
```
]
 
.panel2-filter2-auto[

```
[1] 15524
```

```
# A tibble: 10,866 x 18
   subject_id fake_first_name fake_last_name gender pan_day test_id clinic_name
        <dbl> <chr>           <chr>          <chr>    <dbl> <chr>   <chr>      
 1        275 qezza           greyjoy        female      31 covid   nicu       
 2       4943 hoster          targaryen      male        91 covid   emergency …
 3      12286 harra           harlaw         female      45 covid   laboratory 
 4      11283 petyr           mormont        male        57 covid   clinical l…
 5       6343 mord            bolton         male        57 covid   clinical l…
 6       8979 qyburn          seaworth       male        50 covid   clinical l…
 7       1805 godry           stark          male        98 covid   clinical l…
 8       1488 alys            baratheon      female      75 covid   inpatient …
 9       5966 joffrey         martell        male        32 covid   emergency …
10       7384 harra           targaryen      female      22 covid   clinical l…
# … with 10,856 more rows, and 11 more variables: result <chr>,
#   demo_group <chr>, age <dbl>, drive_thru_ind <dbl>, ct_result <dbl>,
#   orderset <dbl>, payor_group <chr>, patient_class <chr>, col_rec_tat <dbl>,
#   rec_ver_tat <dbl>, fake_date <date>
```
]

---
count: false
 
Example 2/3: Take a Random 70% Sample for Training and a Complementary 30% for Testing.
.panel1-filter2-auto[

```r
# how many rows when you start
nrow(covid_dates)

# make training set
covid_dates %>%
  slice_sample(prop = 0.7) ->
training_covid_dates

# now make testing set
covid_dates %>%
  anti_join(training_covid_dates)->
testing_covid_dates
# see how many rows in each
training_covid_dates
*testing_covid_dates
```
]
 
.panel2-filter2-auto[

```
[1] 15524
```

```
# A tibble: 10,866 x 18
   subject_id fake_first_name fake_last_name gender pan_day test_id clinic_name
        <dbl> <chr>           <chr>          <chr>    <dbl> <chr>   <chr>      
 1       1478 lucas           lannister      male       101 covid   clinical l…
 2      11044 alysane         rivers         female      58 covid   emergency …
 3       7414 donal           stark          male        42 covid   urgent car…
 4        393 glendon         lannister      male        97 covid   clinical l…
 5       1344 anya            seaworth       female     104 covid   clinical l…
 6       5101 nymeria         snow           female     100 covid   clinical l…
 7       4541 matrice         seaworth       female      91 covid   inpatient …
 8       7514 mathos          tyrell         male       103 covid   emergency …
 9       4310 marq            clegane        male        31 covid   clinical l…
10       1427 nolla           baelish        female      32 covid   inpatient …
# … with 10,856 more rows, and 11 more variables: result <chr>,
#   demo_group <chr>, age <dbl>, drive_thru_ind <dbl>, ct_result <dbl>,
#   orderset <dbl>, payor_group <chr>, patient_class <chr>, col_rec_tat <dbl>,
#   rec_ver_tat <dbl>, fake_date <date>
```

```
# A tibble: 4,658 x 18
   subject_id fake_first_name fake_last_name gender pan_day test_id clinic_name
        <dbl> <chr>           <chr>          <chr>    <dbl> <chr>   <chr>      
 1        663 ithoke          targaryen      male         9 covid   clinical l…
 2       3794 styr            tyrell         male         9 covid   clinical l…
 3       9309 maege           sand           female       9 covid   medical ce…
 4       8943 myria           rivers         female       9 covid   picu       
 5       8031 gueren          sand           male        10 covid   clinical l…
 6      10919 woth            snow           male        10 covid   clinical l…
 7        252 nymeria         karstark       female      11 covid   ob gyn     
 8       2427 daenerys        umber          female      11 covid   inpatient …
 9       2983 ronnel          snow           male        11 covid   emergency …
10       3854 husband         snow           male        11 covid   clinical l…
# … with 4,648 more rows, and 11 more variables: result <chr>,
#   demo_group <chr>, age <dbl>, drive_thru_ind <dbl>, ct_result <dbl>,
#   orderset <dbl>, payor_group <chr>, patient_class <chr>, col_rec_tat <dbl>,
#   rec_ver_tat <dbl>, fake_date <date>
```
]

---
count: false
 
Example 2/3: Take a Random 70% Sample for Training and a Complementary 30% for Testing.
.panel1-filter2-auto[

```r
# how many rows when you start
nrow(covid_dates)

# make training set
covid_dates %>%
  slice_sample(prop = 0.7) ->
training_covid_dates

# now make testing set
covid_dates %>%
  anti_join(training_covid_dates)->
testing_covid_dates
# see how many rows in each
training_covid_dates
testing_covid_dates
# Format:
*#   slice_sample(prop = 0.nn)  #   slice_sample(prop = 0.nn)
```
]
 
.panel2-filter2-auto[

```
[1] 15524
```

```
# A tibble: 10,866 x 18
   subject_id fake_first_name fake_last_name gender pan_day test_id clinic_name
        <dbl> <chr>           <chr>          <chr>    <dbl> <chr>   <chr>      
 1       4175 gilwood         targaryen      male        55 covid   clinical l…
 2       4756 kella           baratheon      female      39 covid   inpatient …
 3       6797 jon             umber          male        85 covid   clinical l…
 4       6966 margaery        greyjoy        female      53 covid   emergency …
 5       9502 lorcas          mormont        male        10 covid   clinical l…
 6        220 tickler         frey           male        38 covid   clinical l…
 7       2848 ghost           stark          female      96 covid   emergency …
 8       4951 beric           tarly          male        83 covid   clinical l…
 9        407 eddard          martell        male        64 covid   clinical l…
10       1140 tanda           westerling     female      30 covid   emergency …
# … with 10,856 more rows, and 11 more variables: result <chr>,
#   demo_group <chr>, age <dbl>, drive_thru_ind <dbl>, ct_result <dbl>,
#   orderset <dbl>, payor_group <chr>, patient_class <chr>, col_rec_tat <dbl>,
#   rec_ver_tat <dbl>, fake_date <date>
```

```
# A tibble: 4,658 x 18
   subject_id fake_first_name fake_last_name gender pan_day test_id clinic_name
        <dbl> <chr>           <chr>          <chr>    <dbl> <chr>   <chr>      
 1       1412 jhezane         westerling     female       4 covid   inpatient …
 2       3794 styr            tyrell         male         9 covid   clinical l…
 3       4706 wynafryd        seaworth       male         9 covid   clinical l…
 4       8943 myria           rivers         female       9 covid   picu       
 5       2103 ollo            snow           male        10 covid   clinical l…
 6       2349 yezzan          royce          male        10 covid   line clini…
 7       2083 weasel          tarly          female      10 covid   emergency …
 8       8031 gueren          sand           male        10 covid   clinical l…
 9      10468 chella          mormont        female      10 covid   emergency …
10       9217 ragwyle         martell        female      10 covid   clinical l…
# … with 4,648 more rows, and 11 more variables: result <chr>,
#   demo_group <chr>, age <dbl>, drive_thru_ind <dbl>, ct_result <dbl>,
#   orderset <dbl>, payor_group <chr>, patient_class <chr>, col_rec_tat <dbl>,
#   rec_ver_tat <dbl>, fake_date <date>
```
]

---
count: false
 
Example 2/3: Take a Random 70% Sample for Training and a Complementary 30% for Testing.
.panel1-filter2-auto[

```r
# how many rows when you start
nrow(covid_dates)

# make training set
covid_dates %>%
  slice_sample(prop = 0.7) ->
training_covid_dates

# now make testing set
covid_dates %>%
  anti_join(training_covid_dates)->
testing_covid_dates
# see how many rows in each
training_covid_dates
testing_covid_dates
# Format:
#   slice_sample(prop = 0.nn)  #   slice_sample(prop = 0.nn)
*#   set1 %>% anti_join(set2)  #   set1 %>% anti_join(set2)
```
]
 
.panel2-filter2-auto[

```
[1] 15524
```

```
# A tibble: 10,866 x 18
   subject_id fake_first_name fake_last_name gender pan_day test_id clinic_name
        <dbl> <chr>           <chr>          <chr>    <dbl> <chr>   <chr>      
 1       4652 woth            martell        male        84 covid   clinical l…
 2        293 gilly           westerling     female      73 covid   radiation …
 3       3072 andar           baratheon      male        84 covid   clinical l…
 4         77 nymella         tarly          female      77 covid   radiation …
 5        968 tytos           tarly          male        89 covid   clinical l…
 6      10324 qezza           kettleblack    female      84 covid   clinical l…
 7      12030 creighton       targaryen      male       102 covid   inpatient …
 8        706 alerie          kettleblack    female      47 covid   inpatient …
 9       1625 duram           seaworth       male        65 covid   clinical l…
10       4582 falyse          bolton         female      31 covid   clinical l…
# … with 10,856 more rows, and 11 more variables: result <chr>,
#   demo_group <chr>, age <dbl>, drive_thru_ind <dbl>, ct_result <dbl>,
#   orderset <dbl>, payor_group <chr>, patient_class <chr>, col_rec_tat <dbl>,
#   rec_ver_tat <dbl>, fake_date <date>
```

```
# A tibble: 4,658 x 18
   subject_id fake_first_name fake_last_name gender pan_day test_id clinic_name
        <dbl> <chr>           <chr>          <chr>    <dbl> <chr>   <chr>      
 1       1412 jhezane         westerling     female       4 covid   inpatient …
 2       2158 ravella         frey           female       9 covid   emergency …
 3       4706 wynafryd        seaworth       male         9 covid   clinical l…
 4       8943 myria           rivers         female       9 covid   picu       
 5       6965 arthor          lannister      male         9 covid   clinical l…
 6       8138 frenya          swyft          female      10 covid   clinical l…
 7      10468 chella          mormont        female      10 covid   emergency …
 8        252 nymeria         karstark       female      11 covid   ob gyn     
 9        392 moon            mormont        male        11 covid   clinical l…
10       1299 alys            manderly       female      11 covid   inpatient …
# … with 4,648 more rows, and 11 more variables: result <chr>,
#   demo_group <chr>, age <dbl>, drive_thru_ind <dbl>, ct_result <dbl>,
#   orderset <dbl>, payor_group <chr>, patient_class <chr>, col_rec_tat <dbl>,
#   rec_ver_tat <dbl>, fake_date <date>
```
]

---

count: false
 
Example 3/3: Take a Random Sample of 50 Rows from covid_dates.
.panel1-filter3-auto[

```r
# how many rows when you start
*nrow(covid_dates)
```
]
 
.panel2-filter3-auto[

```
[1] 15524
```
]

---
count: false
 
Example 3/3: Take a Random Sample of 50 Rows from covid_dates.
.panel1-filter3-auto[

```r
# how many rows when you start
nrow(covid_dates)

*covid_dates
```
]
 
.panel2-filter3-auto[

```
[1] 15524
```

---
count: false
 
Example 3/3: Take a Random Sample of 50 Rows from covid_dates.
.panel1-filter3-auto[

```r
# how many rows when you start
nrow(covid_dates)

covid_dates %>%
* slice_sample(n = 50)
```
]
 
.panel2-filter3-auto[

```
[1] 15524
```

```
# A tibble: 50 x 18
   subject_id fake_first_name fake_last_name gender pan_day test_id clinic_name
        <dbl> <chr>           <chr>          <chr>    <dbl> <chr>   <chr>      
 1       5262 rigney          stark          male        99 covid   hem onc da…
 2       9991 del             bolton         male        69 covid   emergency …
 3       3369 brienne         umber          female     105 covid   clinical l…
 4      10956 wat             targaryen      male        97 covid   cc care nt…
 5       3811 kella           frey           female      75 covid   oncology d…
 6       6551 masha           mormont        female      30 covid   emergency …
 7       8211 jhezane         greyjoy        female      32 covid   clinical l…
 8        385 tanda           snow           female      44 covid   clinical l…
 9      10780 marissa         seaworth       female      63 covid   emergency …
10       3543 alia            karstark       female      30 covid   emergency …
# … with 40 more rows, and 11 more variables: result <chr>, demo_group <chr>,
#   age <dbl>, drive_thru_ind <dbl>, ct_result <dbl>, orderset <dbl>,
#   payor_group <chr>, patient_class <chr>, col_rec_tat <dbl>,
#   rec_ver_tat <dbl>, fake_date <date>
```
]

---
count: false
 
Example 3/3: Take a Random Sample of 50 Rows from covid_dates.
.panel1-filter3-auto[

```r
# how many rows when you start
nrow(covid_dates)

covid_dates %>%
  slice_sample(n = 50)

# see how many rows now

# Format:
*#   slice_sample(n = NN)                    <br>  #   slice_sample(n = NN)                    <br>
```
]
 
.panel2-filter3-auto[

```
[1] 15524
```

```
# A tibble: 50 x 18
   subject_id fake_first_name fake_last_name gender pan_day test_id clinic_name
        <dbl> <chr>           <chr>          <chr>    <dbl> <chr>   <chr>      
 1       8272 ragwyle         rivers         female      86 covid   emergency …
 2      10480 zei             westerling     female      53 covid   clinical l…
 3        592 osney           seaworth       male        32 covid   nicu       
 4       8204 kojja           baratheon      female      23 covid   emergency …
 5       9551 ricasso         snow           male        21 covid   clinical l…
 6      10767 palla           mormont        female      58 covid   clinical l…
 7       3041 tanda           stark          female      93 covid   emergency …
 8      10432 mag             karstark       male        91 covid   inpatient …
 9       2943 shagwell        rivers         male        75 covid   inpatient …
10       3719 halys           tully          male        50 covid   clinical l…
# … with 40 more rows, and 11 more variables: result <chr>, demo_group <chr>,
#   age <dbl>, drive_thru_ind <dbl>, ct_result <dbl>, orderset <dbl>,
#   payor_group <chr>, patient_class <chr>, col_rec_tat <dbl>,
#   rec_ver_tat <dbl>, fake_date <date>
```
]

---
class: inverse, center

# End of This Flipbook

## On to The Coding Exercises!