class: center, middle, inverse, title-slide # How to Use
filter()
on Character Variables ## Using
str_detect
to find the rows you want ### Peter Higgins ### 2021-01-10 --- ### How to Use Filters to Pick Out String Values #### Leaning on the {stringr} package Format: <br> dataset %>% filter(variable == "string")) OR dataset %>% filter(str_detect(variable, "string")) The _str_detect()_ function returns either a TRUE or FALSE for each row in the dataset - a **logical vector**. <br> The filter function **acts** on this logical vector to filter out rows that are FALSE for your logical statement. It is a common mistake to rely on str_detect alone, but it won't filter your rows. You need **both** filter and str_detect to get this done. We will use the CMV dataset, which looks at bone marrow transplantation for a variety of cancers. Let's look at some filter examples! --- count: false Example 1/5: Filter Rows with str_detect on Character Variables .panel1-filter1-auto[ ```r # how many rows when you start *nrow(cmv) ``` ] .panel2-filter1-auto[ ``` [1] 64 ``` ] --- count: false Example 1/5: Filter Rows with str_detect on Character Variables .panel1-filter1-auto[ ```r # how many rows when you start nrow(cmv) *cmv ``` ] .panel2-filter1-auto[ ``` [1] 64 ``` ``` # A tibble: 64 x 26 id age sex race diagnosis diagnosis_type time_to_transpl… <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl> 1 1 61 1 0 acute my… 1 5.16 2 2 62 1 1 non-Hodg… 0 79.0 3 3 63 0 1 non-Hodg… 0 35.6 4 4 33 0 1 Hodgkin … 0 33.0 5 5 54 0 1 acute ly… 0 11.4 6 6 55 1 1 myelofib… 1 2.43 7 7 67 1 1 acute my… 1 9.59 8 8 51 1 1 acute my… 1 NA 9 9 44 0 0 multiple… 0 43.4 10 10 59 1 1 chronic … 0 92.7 # … with 54 more rows, and 19 more variables: prior_radiation <dbl>, # prior_chemo <dbl>, prior_transplant <dbl>, recipient_cmv <dbl>, # donor_cmv <dbl>, donor_sex <dbl>, tnc_dose <dbl>, cd34_dose <dbl>, # cd3_dose <dbl>, cd8_dose <dbl>, tbi_dose <dbl>, c1_c2 <dbl>, a_ki_rs <dbl>, # cmv <dbl>, time_to_cmv <dbl>, agvhd <dbl>, time_to_agvhd <dbl>, # cgvhd <dbl>, time_to_cgvhd <dbl> ``` ] --- count: false Example 1/5: Filter Rows with str_detect on Character Variables .panel1-filter1-auto[ ```r # how many rows when you start nrow(cmv) cmv %>% # selected 3 columns # see how diagnosis changes w/filter * select(age, sex, diagnosis) ``` ] .panel2-filter1-auto[ ``` [1] 64 ``` ``` # A tibble: 64 x 3 age sex diagnosis <dbl> <dbl> <chr> 1 61 1 acute myeloid leukemia 2 62 1 non-Hodgkin lymphoma 3 63 0 non-Hodgkin lymphoma 4 33 0 Hodgkin lymphoma 5 54 0 acute lymphoblastic leukemia 6 55 1 myelofibrosis 7 67 1 acute myeloid leukemia 8 51 1 acute myeloid leukemia 9 44 0 multiple myelomas 10 59 1 chronic lymphocytic leukemia # … with 54 more rows ``` ] --- count: false Example 1/5: Filter Rows with str_detect on Character Variables .panel1-filter1-auto[ ```r # how many rows when you start nrow(cmv) cmv %>% # selected 3 columns # see how diagnosis changes w/filter select(age, sex, diagnosis) %>% * filter(str_detect(diagnosis, "oma")) ``` ] .panel2-filter1-auto[ ``` [1] 64 ``` ``` # A tibble: 26 x 3 age sex diagnosis <dbl> <dbl> <chr> 1 62 1 non-Hodgkin lymphoma 2 63 0 non-Hodgkin lymphoma 3 33 0 Hodgkin lymphoma 4 44 0 multiple myelomas 5 45 1 multiple myelomas 6 38 0 multiple myelomas 7 61 0 non-Hodgkin lymphoma 8 62 1 non-Hodgkin lymphoma 9 48 1 renal cell carcinoma 10 48 1 renal cell carcinoma # … with 16 more rows ``` ] --- count: false Example 1/5: Filter Rows with str_detect on Character Variables .panel1-filter1-auto[ ```r # how many rows when you start nrow(cmv) cmv %>% # selected 3 columns # see how diagnosis changes w/filter select(age, sex, diagnosis) %>% filter(str_detect(diagnosis, "oma")) # see how many rows now # check diagnosis # Format: *# filter(str_detect(variable,"string")) <br> # filter(str_detect(variable,"string")) <br> ``` ] .panel2-filter1-auto[ ``` [1] 64 ``` ``` # A tibble: 26 x 3 age sex diagnosis <dbl> <dbl> <chr> 1 62 1 non-Hodgkin lymphoma 2 63 0 non-Hodgkin lymphoma 3 33 0 Hodgkin lymphoma 4 44 0 multiple myelomas 5 45 1 multiple myelomas 6 38 0 multiple myelomas 7 61 0 non-Hodgkin lymphoma 8 62 1 non-Hodgkin lymphoma 9 48 1 renal cell carcinoma 10 48 1 renal cell carcinoma # … with 16 more rows ``` ] --- count: false Example 1/5: Filter Rows with str_detect on Character Variables .panel1-filter1-auto[ ```r # how many rows when you start nrow(cmv) cmv %>% # selected 3 columns # see how diagnosis changes w/filter select(age, sex, diagnosis) %>% filter(str_detect(diagnosis, "oma")) # see how many rows now # check diagnosis # Format: # filter(str_detect(variable,"string")) <br> # filter(str_detect(variable,"string")) <br> ``` ] .panel2-filter1-auto[ ``` [1] 64 ``` ``` # A tibble: 26 x 3 age sex diagnosis <dbl> <dbl> <chr> 1 62 1 non-Hodgkin lymphoma 2 63 0 non-Hodgkin lymphoma 3 33 0 Hodgkin lymphoma 4 44 0 multiple myelomas 5 45 1 multiple myelomas 6 38 0 multiple myelomas 7 61 0 non-Hodgkin lymphoma 8 62 1 non-Hodgkin lymphoma 9 48 1 renal cell carcinoma 10 48 1 renal cell carcinoma # … with 16 more rows ``` ] <style> .panel1-filter1-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-filter1-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-filter1-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- count: false Example 2/5: Filter Not-Rows (! Negation) with str_detect on Character Variables .panel1-filter2-auto[ ```r # how many rows when you start *nrow(cmv) ``` ] .panel2-filter2-auto[ ``` [1] 64 ``` ] --- count: false Example 2/5: Filter Not-Rows (! Negation) with str_detect on Character Variables .panel1-filter2-auto[ ```r # how many rows when you start nrow(cmv) *cmv ``` ] .panel2-filter2-auto[ ``` [1] 64 ``` ``` # A tibble: 64 x 26 id age sex race diagnosis diagnosis_type time_to_transpl… <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl> 1 1 61 1 0 acute my… 1 5.16 2 2 62 1 1 non-Hodg… 0 79.0 3 3 63 0 1 non-Hodg… 0 35.6 4 4 33 0 1 Hodgkin … 0 33.0 5 5 54 0 1 acute ly… 0 11.4 6 6 55 1 1 myelofib… 1 2.43 7 7 67 1 1 acute my… 1 9.59 8 8 51 1 1 acute my… 1 NA 9 9 44 0 0 multiple… 0 43.4 10 10 59 1 1 chronic … 0 92.7 # … with 54 more rows, and 19 more variables: prior_radiation <dbl>, # prior_chemo <dbl>, prior_transplant <dbl>, recipient_cmv <dbl>, # donor_cmv <dbl>, donor_sex <dbl>, tnc_dose <dbl>, cd34_dose <dbl>, # cd3_dose <dbl>, cd8_dose <dbl>, tbi_dose <dbl>, c1_c2 <dbl>, a_ki_rs <dbl>, # cmv <dbl>, time_to_cmv <dbl>, agvhd <dbl>, time_to_agvhd <dbl>, # cgvhd <dbl>, time_to_cgvhd <dbl> ``` ] --- count: false Example 2/5: Filter Not-Rows (! Negation) with str_detect on Character Variables .panel1-filter2-auto[ ```r # how many rows when you start nrow(cmv) cmv %>% # selected 3 columns # see how diagnosis changes w/filter * select(age, race, diagnosis) ``` ] .panel2-filter2-auto[ ``` [1] 64 ``` ``` # A tibble: 64 x 3 age race diagnosis <dbl> <dbl> <chr> 1 61 0 acute myeloid leukemia 2 62 1 non-Hodgkin lymphoma 3 63 1 non-Hodgkin lymphoma 4 33 1 Hodgkin lymphoma 5 54 1 acute lymphoblastic leukemia 6 55 1 myelofibrosis 7 67 1 acute myeloid leukemia 8 51 1 acute myeloid leukemia 9 44 0 multiple myelomas 10 59 1 chronic lymphocytic leukemia # … with 54 more rows ``` ] --- count: false Example 2/5: Filter Not-Rows (! Negation) with str_detect on Character Variables .panel1-filter2-auto[ ```r # how many rows when you start nrow(cmv) cmv %>% # selected 3 columns # see how diagnosis changes w/filter select(age, race, diagnosis) %>% * filter(!str_detect(diagnosis, * pattern = "Hodgkin")) ``` ] .panel2-filter2-auto[ ``` [1] 64 ``` ``` # A tibble: 49 x 3 age race diagnosis <dbl> <dbl> <chr> 1 61 0 acute myeloid leukemia 2 54 1 acute lymphoblastic leukemia 3 55 1 myelofibrosis 4 67 1 acute myeloid leukemia 5 51 1 acute myeloid leukemia 6 44 0 multiple myelomas 7 59 1 chronic lymphocytic leukemia 8 45 1 multiple myelomas 9 57 1 acute myeloid leukemia 10 52 1 myelodysplastic syndrome # … with 39 more rows ``` ] --- count: false Example 2/5: Filter Not-Rows (! Negation) with str_detect on Character Variables .panel1-filter2-auto[ ```r # how many rows when you start nrow(cmv) cmv %>% # selected 3 columns # see how diagnosis changes w/filter select(age, race, diagnosis) %>% filter(!str_detect(diagnosis, pattern = "Hodgkin")) # see how many rows now # check diagnosis # Format: *# filter(str_detect(variable,"string")) <br> # filter(str_detect(variable,"string")) <br> ``` ] .panel2-filter2-auto[ ``` [1] 64 ``` ``` # A tibble: 49 x 3 age race diagnosis <dbl> <dbl> <chr> 1 61 0 acute myeloid leukemia 2 54 1 acute lymphoblastic leukemia 3 55 1 myelofibrosis 4 67 1 acute myeloid leukemia 5 51 1 acute myeloid leukemia 6 44 0 multiple myelomas 7 59 1 chronic lymphocytic leukemia 8 45 1 multiple myelomas 9 57 1 acute myeloid leukemia 10 52 1 myelodysplastic syndrome # … with 39 more rows ``` ] --- count: false Example 2/5: Filter Not-Rows (! Negation) with str_detect on Character Variables .panel1-filter2-auto[ ```r # how many rows when you start nrow(cmv) cmv %>% # selected 3 columns # see how diagnosis changes w/filter select(age, race, diagnosis) %>% filter(!str_detect(diagnosis, pattern = "Hodgkin")) # see how many rows now # check diagnosis # Format: # filter(str_detect(variable,"string")) <br> # filter(str_detect(variable,"string")) <br> ``` ] .panel2-filter2-auto[ ``` [1] 64 ``` ``` # A tibble: 49 x 3 age race diagnosis <dbl> <dbl> <chr> 1 61 0 acute myeloid leukemia 2 54 1 acute lymphoblastic leukemia 3 55 1 myelofibrosis 4 67 1 acute myeloid leukemia 5 51 1 acute myeloid leukemia 6 44 0 multiple myelomas 7 59 1 chronic lymphocytic leukemia 8 45 1 multiple myelomas 9 57 1 acute myeloid leukemia 10 52 1 myelodysplastic syndrome # … with 39 more rows ``` ] <style> .panel1-filter2-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-filter2-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-filter2-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- count: false Example 3/5: Filter Rows with a Little Bit of Regex<br>.+ means any character from 1 to N times .panel1-filter3-auto[ ```r # how many rows when you start *nrow(cmv) ``` ] .panel2-filter3-auto[ ``` [1] 64 ``` ] --- count: false Example 3/5: Filter Rows with a Little Bit of Regex<br>.+ means any character from 1 to N times .panel1-filter3-auto[ ```r # how many rows when you start nrow(cmv) *cmv ``` ] .panel2-filter3-auto[ ``` [1] 64 ``` ``` # A tibble: 64 x 26 id age sex race diagnosis diagnosis_type time_to_transpl… <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl> 1 1 61 1 0 acute my… 1 5.16 2 2 62 1 1 non-Hodg… 0 79.0 3 3 63 0 1 non-Hodg… 0 35.6 4 4 33 0 1 Hodgkin … 0 33.0 5 5 54 0 1 acute ly… 0 11.4 6 6 55 1 1 myelofib… 1 2.43 7 7 67 1 1 acute my… 1 9.59 8 8 51 1 1 acute my… 1 NA 9 9 44 0 0 multiple… 0 43.4 10 10 59 1 1 chronic … 0 92.7 # … with 54 more rows, and 19 more variables: prior_radiation <dbl>, # prior_chemo <dbl>, prior_transplant <dbl>, recipient_cmv <dbl>, # donor_cmv <dbl>, donor_sex <dbl>, tnc_dose <dbl>, cd34_dose <dbl>, # cd3_dose <dbl>, cd8_dose <dbl>, tbi_dose <dbl>, c1_c2 <dbl>, a_ki_rs <dbl>, # cmv <dbl>, time_to_cmv <dbl>, agvhd <dbl>, time_to_agvhd <dbl>, # cgvhd <dbl>, time_to_cgvhd <dbl> ``` ] --- count: false Example 3/5: Filter Rows with a Little Bit of Regex<br>.+ means any character from 1 to N times .panel1-filter3-auto[ ```r # how many rows when you start nrow(cmv) cmv %>% # selected 3 columns # see how diagnosis changes w/filter * select(sex, race, diagnosis) ``` ] .panel2-filter3-auto[ ``` [1] 64 ``` ``` # A tibble: 64 x 3 sex race diagnosis <dbl> <dbl> <chr> 1 1 0 acute myeloid leukemia 2 1 1 non-Hodgkin lymphoma 3 0 1 non-Hodgkin lymphoma 4 0 1 Hodgkin lymphoma 5 0 1 acute lymphoblastic leukemia 6 1 1 myelofibrosis 7 1 1 acute myeloid leukemia 8 1 1 acute myeloid leukemia 9 0 0 multiple myelomas 10 1 1 chronic lymphocytic leukemia # … with 54 more rows ``` ] --- count: false Example 3/5: Filter Rows with a Little Bit of Regex<br>.+ means any character from 1 to N times .panel1-filter3-auto[ ```r # how many rows when you start nrow(cmv) cmv %>% # selected 3 columns # see how diagnosis changes w/filter select(sex, race, diagnosis) %>% * filter(str_detect(diagnosis, * pattern = "lympho.+ic")) ``` ] .panel2-filter3-auto[ ``` [1] 64 ``` ``` # A tibble: 6 x 3 sex race diagnosis <dbl> <dbl> <chr> 1 0 1 acute lymphoblastic leukemia 2 1 1 chronic lymphocytic leukemia 3 0 1 chronic lymphocytic leukemia 4 0 0 chronic lymphocytic leukemia 5 1 1 chronic lymphocytic leukemia 6 1 1 chronic lymphocytic leukemia ``` ] --- count: false Example 3/5: Filter Rows with a Little Bit of Regex<br>.+ means any character from 1 to N times .panel1-filter3-auto[ ```r # how many rows when you start nrow(cmv) cmv %>% # selected 3 columns # see how diagnosis changes w/filter select(sex, race, diagnosis) %>% filter(str_detect(diagnosis, pattern = "lympho.+ic")) # see how many rows now # check diagnosis # Format: *# filter(str_detect(variable,"string")) <br> # filter(str_detect(variable,"string")) <br> ``` ] .panel2-filter3-auto[ ``` [1] 64 ``` ``` # A tibble: 6 x 3 sex race diagnosis <dbl> <dbl> <chr> 1 0 1 acute lymphoblastic leukemia 2 1 1 chronic lymphocytic leukemia 3 0 1 chronic lymphocytic leukemia 4 0 0 chronic lymphocytic leukemia 5 1 1 chronic lymphocytic leukemia 6 1 1 chronic lymphocytic leukemia ``` ] --- count: false Example 3/5: Filter Rows with a Little Bit of Regex<br>.+ means any character from 1 to N times .panel1-filter3-auto[ ```r # how many rows when you start nrow(cmv) cmv %>% # selected 3 columns # see how diagnosis changes w/filter select(sex, race, diagnosis) %>% filter(str_detect(diagnosis, pattern = "lympho.+ic")) # see how many rows now # check diagnosis # Format: # filter(str_detect(variable,"string")) <br> # filter(str_detect(variable,"string")) <br> ``` ] .panel2-filter3-auto[ ``` [1] 64 ``` ``` # A tibble: 6 x 3 sex race diagnosis <dbl> <dbl> <chr> 1 0 1 acute lymphoblastic leukemia 2 1 1 chronic lymphocytic leukemia 3 0 1 chronic lymphocytic leukemia 4 0 0 chronic lymphocytic leukemia 5 1 1 chronic lymphocytic leukemia 6 1 1 chronic lymphocytic leukemia ``` ] <style> .panel1-filter3-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-filter3-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-filter3-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- count: false Example 4/5: Filter Rows with a Little Bit of Regex<br> .* means any character from 0 to N times .panel1-filter4-auto[ ```r # how many rows when you start *nrow(cmv) ``` ] .panel2-filter4-auto[ ``` [1] 64 ``` ] --- count: false Example 4/5: Filter Rows with a Little Bit of Regex<br> .* means any character from 0 to N times .panel1-filter4-auto[ ```r # how many rows when you start nrow(cmv) *cmv ``` ] .panel2-filter4-auto[ ``` [1] 64 ``` ``` # A tibble: 64 x 26 id age sex race diagnosis diagnosis_type time_to_transpl… <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl> 1 1 61 1 0 acute my… 1 5.16 2 2 62 1 1 non-Hodg… 0 79.0 3 3 63 0 1 non-Hodg… 0 35.6 4 4 33 0 1 Hodgkin … 0 33.0 5 5 54 0 1 acute ly… 0 11.4 6 6 55 1 1 myelofib… 1 2.43 7 7 67 1 1 acute my… 1 9.59 8 8 51 1 1 acute my… 1 NA 9 9 44 0 0 multiple… 0 43.4 10 10 59 1 1 chronic … 0 92.7 # … with 54 more rows, and 19 more variables: prior_radiation <dbl>, # prior_chemo <dbl>, prior_transplant <dbl>, recipient_cmv <dbl>, # donor_cmv <dbl>, donor_sex <dbl>, tnc_dose <dbl>, cd34_dose <dbl>, # cd3_dose <dbl>, cd8_dose <dbl>, tbi_dose <dbl>, c1_c2 <dbl>, a_ki_rs <dbl>, # cmv <dbl>, time_to_cmv <dbl>, agvhd <dbl>, time_to_agvhd <dbl>, # cgvhd <dbl>, time_to_cgvhd <dbl> ``` ] --- count: false Example 4/5: Filter Rows with a Little Bit of Regex<br> .* means any character from 0 to N times .panel1-filter4-auto[ ```r # how many rows when you start nrow(cmv) cmv %>% # selected 3 columns # see how diagnosis changes w/filter * select(sex, race, diagnosis) ``` ] .panel2-filter4-auto[ ``` [1] 64 ``` ``` # A tibble: 64 x 3 sex race diagnosis <dbl> <dbl> <chr> 1 1 0 acute myeloid leukemia 2 1 1 non-Hodgkin lymphoma 3 0 1 non-Hodgkin lymphoma 4 0 1 Hodgkin lymphoma 5 0 1 acute lymphoblastic leukemia 6 1 1 myelofibrosis 7 1 1 acute myeloid leukemia 8 1 1 acute myeloid leukemia 9 0 0 multiple myelomas 10 1 1 chronic lymphocytic leukemia # … with 54 more rows ``` ] --- count: false Example 4/5: Filter Rows with a Little Bit of Regex<br> .* means any character from 0 to N times .panel1-filter4-auto[ ```r # how many rows when you start nrow(cmv) cmv %>% # selected 3 columns # see how diagnosis changes w/filter select(sex, race, diagnosis) %>% * filter(str_detect(diagnosis, "myelo.*")) ``` ] .panel2-filter4-auto[ ``` [1] 64 ``` ``` # A tibble: 37 x 3 sex race diagnosis <dbl> <dbl> <chr> 1 1 0 acute myeloid leukemia 2 1 1 myelofibrosis 3 1 1 acute myeloid leukemia 4 1 1 acute myeloid leukemia 5 0 0 multiple myelomas 6 1 1 multiple myelomas 7 1 1 acute myeloid leukemia 8 0 1 myelodysplastic syndrome 9 0 1 multiple myelomas 10 1 1 myelodysplastic syndrome # … with 27 more rows ``` ] --- count: false Example 4/5: Filter Rows with a Little Bit of Regex<br> .* means any character from 0 to N times .panel1-filter4-auto[ ```r # how many rows when you start nrow(cmv) cmv %>% # selected 3 columns # see how diagnosis changes w/filter select(sex, race, diagnosis) %>% filter(str_detect(diagnosis, "myelo.*")) # see how many rows now # check diagnosis # Format: *# filter(str_detect(variable,"string")) <br> # filter(str_detect(variable,"string")) <br> ``` ] .panel2-filter4-auto[ ``` [1] 64 ``` ``` # A tibble: 37 x 3 sex race diagnosis <dbl> <dbl> <chr> 1 1 0 acute myeloid leukemia 2 1 1 myelofibrosis 3 1 1 acute myeloid leukemia 4 1 1 acute myeloid leukemia 5 0 0 multiple myelomas 6 1 1 multiple myelomas 7 1 1 acute myeloid leukemia 8 0 1 myelodysplastic syndrome 9 0 1 multiple myelomas 10 1 1 myelodysplastic syndrome # … with 27 more rows ``` ] --- count: false Example 4/5: Filter Rows with a Little Bit of Regex<br> .* means any character from 0 to N times .panel1-filter4-auto[ ```r # how many rows when you start nrow(cmv) cmv %>% # selected 3 columns # see how diagnosis changes w/filter select(sex, race, diagnosis) %>% filter(str_detect(diagnosis, "myelo.*")) # see how many rows now # check diagnosis # Format: # filter(str_detect(variable,"string")) <br> # filter(str_detect(variable,"string")) <br> ``` ] .panel2-filter4-auto[ ``` [1] 64 ``` ``` # A tibble: 37 x 3 sex race diagnosis <dbl> <dbl> <chr> 1 1 0 acute myeloid leukemia 2 1 1 myelofibrosis 3 1 1 acute myeloid leukemia 4 1 1 acute myeloid leukemia 5 0 0 multiple myelomas 6 1 1 multiple myelomas 7 1 1 acute myeloid leukemia 8 0 1 myelodysplastic syndrome 9 0 1 multiple myelomas 10 1 1 myelodysplastic syndrome # … with 27 more rows ``` ] <style> .panel1-filter4-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-filter4-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-filter4-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- count: false Example 5/5: Filter Rows with Variable == 'string'. More Exact, Less Flexible .panel1-filter5-auto[ ```r # how many rows when you start *nrow(cmv) ``` ] .panel2-filter5-auto[ ``` [1] 64 ``` ] --- count: false Example 5/5: Filter Rows with Variable == 'string'. More Exact, Less Flexible .panel1-filter5-auto[ ```r # how many rows when you start nrow(cmv) *cmv ``` ] .panel2-filter5-auto[ ``` [1] 64 ``` ``` # A tibble: 64 x 26 id age sex race diagnosis diagnosis_type time_to_transpl… <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl> 1 1 61 1 0 acute my… 1 5.16 2 2 62 1 1 non-Hodg… 0 79.0 3 3 63 0 1 non-Hodg… 0 35.6 4 4 33 0 1 Hodgkin … 0 33.0 5 5 54 0 1 acute ly… 0 11.4 6 6 55 1 1 myelofib… 1 2.43 7 7 67 1 1 acute my… 1 9.59 8 8 51 1 1 acute my… 1 NA 9 9 44 0 0 multiple… 0 43.4 10 10 59 1 1 chronic … 0 92.7 # … with 54 more rows, and 19 more variables: prior_radiation <dbl>, # prior_chemo <dbl>, prior_transplant <dbl>, recipient_cmv <dbl>, # donor_cmv <dbl>, donor_sex <dbl>, tnc_dose <dbl>, cd34_dose <dbl>, # cd3_dose <dbl>, cd8_dose <dbl>, tbi_dose <dbl>, c1_c2 <dbl>, a_ki_rs <dbl>, # cmv <dbl>, time_to_cmv <dbl>, agvhd <dbl>, time_to_agvhd <dbl>, # cgvhd <dbl>, time_to_cgvhd <dbl> ``` ] --- count: false Example 5/5: Filter Rows with Variable == 'string'. More Exact, Less Flexible .panel1-filter5-auto[ ```r # how many rows when you start nrow(cmv) cmv %>% # selected 3 columns # see how diagnosis changes w/filter * select(sex, race, diagnosis) ``` ] .panel2-filter5-auto[ ``` [1] 64 ``` ``` # A tibble: 64 x 3 sex race diagnosis <dbl> <dbl> <chr> 1 1 0 acute myeloid leukemia 2 1 1 non-Hodgkin lymphoma 3 0 1 non-Hodgkin lymphoma 4 0 1 Hodgkin lymphoma 5 0 1 acute lymphoblastic leukemia 6 1 1 myelofibrosis 7 1 1 acute myeloid leukemia 8 1 1 acute myeloid leukemia 9 0 0 multiple myelomas 10 1 1 chronic lymphocytic leukemia # … with 54 more rows ``` ] --- count: false Example 5/5: Filter Rows with Variable == 'string'. More Exact, Less Flexible .panel1-filter5-auto[ ```r # how many rows when you start nrow(cmv) cmv %>% # selected 3 columns # see how diagnosis changes w/filter select(sex, race, diagnosis) %>% * filter(diagnosis == "myelofibrosis") ``` ] .panel2-filter5-auto[ ``` [1] 64 ``` ``` # A tibble: 4 x 3 sex race diagnosis <dbl> <dbl> <chr> 1 1 1 myelofibrosis 2 0 1 myelofibrosis 3 1 1 myelofibrosis 4 0 1 myelofibrosis ``` ] --- count: false Example 5/5: Filter Rows with Variable == 'string'. More Exact, Less Flexible .panel1-filter5-auto[ ```r # how many rows when you start nrow(cmv) cmv %>% # selected 3 columns # see how diagnosis changes w/filter select(sex, race, diagnosis) %>% filter(diagnosis == "myelofibrosis") # see how many rows now # check diagnosis # Format: *# filter(variable == "string") <br> # filter(variable == "string") <br> ``` ] .panel2-filter5-auto[ ``` [1] 64 ``` ``` # A tibble: 4 x 3 sex race diagnosis <dbl> <dbl> <chr> 1 1 1 myelofibrosis 2 0 1 myelofibrosis 3 1 1 myelofibrosis 4 0 1 myelofibrosis ``` ] --- count: false Example 5/5: Filter Rows with Variable == 'string'. More Exact, Less Flexible .panel1-filter5-auto[ ```r # how many rows when you start nrow(cmv) cmv %>% # selected 3 columns # see how diagnosis changes w/filter select(sex, race, diagnosis) %>% filter(diagnosis == "myelofibrosis") # see how many rows now # check diagnosis # Format: # filter(variable == "string") <br> # filter(variable == "string") <br> ``` ] .panel2-filter5-auto[ ``` [1] 64 ``` ``` # A tibble: 4 x 3 sex race diagnosis <dbl> <dbl> <chr> 1 1 1 myelofibrosis 2 0 1 myelofibrosis 3 1 1 myelofibrosis 4 0 1 myelofibrosis ``` ] <style> .panel1-filter5-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-filter5-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-filter5-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- ### Where To Learn More about Regex (Regular Expressions) in R, click on the word in italics to go to _[R-manual](https://stat.ethz.ch/R-manual/R-devel/library/base/html/regex.html)_ For general regex, click on the word in italics to go to _[regexone](https://regexone.com/)_ and/or click on the word in italics to go to _[sitepoint](https://www.sitepoint.com/learn-regex/)_ and/or click on the word in italics to go to _[learn-regex](https://github.com/ziishaned/learn-regex)_ --- class: inverse, center # End of This Flipbook ## On to The Coding Exercises!