+ - 0:00:00
Notes for current slide
Notes for next slide

How to Use filter() functions with Dates

Selecting specific rows based on Dates and Date Ranges

Peter Higgins

2021-01-10

1 / 3

How to Use the filter() function for Rows by Date

Format:

dataset %>% filter(between(variable, start, end))

Note that between is nested within filter.

Start and end dates usually use the {lubridate} package, and the format
ymd("2015-12-31")

Remember that filteR is for selecting Rows because it ends with an R.

Let's step through some filter Examples!

2 / 3

Example 1/4: Filter Rows Between Dates

# how many rows when you start
nrow(covid_dates)
[1] 15524
2 / 3

Example 1/4: Filter Rows Between Dates

# how many rows when you start
nrow(covid_dates)
covid_dates
[1] 15524
# A tibble: 15,524 x 18
subject_id fake_first_name fake_last_name gender pan_day test_id clinic_name
<dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
1 1412 jhezane westerling female 4 covid inpatient …
2 533 penny targaryen female 7 covid clinical l…
3 9134 grunt rivers male 7 covid clinical l…
4 8518 melisandre swyft female 8 covid clinical l…
5 8967 rolley karstark male 8 covid emergency …
6 11048 megga karstark female 8 covid oncology d…
7 663 ithoke targaryen male 9 covid clinical l…
8 2158 ravella frey female 9 covid emergency …
9 3794 styr tyrell male 9 covid clinical l…
10 4706 wynafryd seaworth male 9 covid clinical l…
# … with 15,514 more rows, and 11 more variables: result <chr>,
# demo_group <chr>, age <dbl>, drive_thru_ind <dbl>, ct_result <dbl>,
# orderset <dbl>, payor_group <chr>, patient_class <chr>, col_rec_tat <dbl>,
# rec_ver_tat <dbl>, fake_date <date>
2 / 3

Example 1/4: Filter Rows Between Dates

# how many rows when you start
nrow(covid_dates)
covid_dates %>%
# selected 3 columns
# see how date changes w/filter
select(fake_date, result, gender)
[1] 15524
# A tibble: 15,524 x 3
fake_date result gender
<date> <chr> <chr>
1 2020-03-05 negative female
2 2020-03-08 negative female
3 2020-03-08 negative male
4 2020-03-09 negative female
5 2020-03-09 negative male
6 2020-03-09 negative female
7 2020-03-10 negative male
8 2020-03-10 negative female
9 2020-03-10 negative male
10 2020-03-10 negative male
# … with 15,514 more rows
2 / 3

Example 1/4: Filter Rows Between Dates

# how many rows when you start
nrow(covid_dates)
covid_dates %>%
# selected 3 columns
# see how date changes w/filter
select(fake_date, result, gender) %>%
filter(between(fake_date,
ymd("2020-03-01"), ymd("2020-03-31")))
[1] 15524
# A tibble: 2,421 x 3
fake_date result gender
<date> <chr> <chr>
1 2020-03-05 negative female
2 2020-03-08 negative female
3 2020-03-08 negative male
4 2020-03-09 negative female
5 2020-03-09 negative male
6 2020-03-09 negative female
7 2020-03-10 negative male
8 2020-03-10 negative female
9 2020-03-10 negative male
10 2020-03-10 negative male
# … with 2,411 more rows
2 / 3

Example 1/4: Filter Rows Between Dates

# how many rows when you start
nrow(covid_dates)
covid_dates %>%
# selected 3 columns
# see how date changes w/filter
select(fake_date, result, gender) %>%
filter(between(fake_date,
ymd("2020-03-01"), ymd("2020-03-31")))
# see how many rows now
# check dates - between is inclusive
# of end dates - tests on 3/1
# and on 3/31 are included.
# Format:
# filter(between(variable, date, date)) <br> # filter(between(variable, date, date)) <br>
[1] 15524
# A tibble: 2,421 x 3
fake_date result gender
<date> <chr> <chr>
1 2020-03-05 negative female
2 2020-03-08 negative female
3 2020-03-08 negative male
4 2020-03-09 negative female
5 2020-03-09 negative male
6 2020-03-09 negative female
7 2020-03-10 negative male
8 2020-03-10 negative female
9 2020-03-10 negative male
10 2020-03-10 negative male
# … with 2,411 more rows
2 / 3

Example 1/4: Filter Rows Between Dates

# how many rows when you start
nrow(covid_dates)
covid_dates %>%
# selected 3 columns
# see how date changes w/filter
select(fake_date, result, gender) %>%
filter(between(fake_date,
ymd("2020-03-01"), ymd("2020-03-31")))
# see how many rows now
# check dates - between is inclusive
# of end dates - tests on 3/1
# and on 3/31 are included.
# Format:
# filter(between(variable, date, date)) <br> # filter(between(variable, date, date)) <br>
[1] 15524
# A tibble: 2,421 x 3
fake_date result gender
<date> <chr> <chr>
1 2020-03-05 negative female
2 2020-03-08 negative female
3 2020-03-08 negative male
4 2020-03-09 negative female
5 2020-03-09 negative male
6 2020-03-09 negative female
7 2020-03-10 negative male
8 2020-03-10 negative female
9 2020-03-10 negative male
10 2020-03-10 negative male
# … with 2,411 more rows
2 / 3

Example 2/4: Find Bone Marrow Transplants in 2008-2018 in 40-70 yo

# how many rows when you start
nrow(bmt_dates)
[1] 64
2 / 3

Example 2/4: Find Bone Marrow Transplants in 2008-2018 in 40-70 yo

# how many rows when you start
nrow(bmt_dates)
bmt_dates
[1] 64
# A tibble: 64 x 29
id age sex race diagnosis diagnosis_type time_to_transpl…
<dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
1 1 61 1 0 acute my… 1 5.16
2 2 62 1 1 non-Hodg… 0 79.0
3 3 63 0 1 non-Hodg… 0 35.6
4 4 33 0 1 Hodgkin … 0 33.0
5 5 54 0 1 acute ly… 0 11.4
6 6 55 1 1 myelofib… 1 2.43
7 7 67 1 1 acute my… 1 9.59
8 8 51 1 1 acute my… 1 NA
9 9 44 0 0 multiple… 0 43.4
10 10 59 1 1 chronic … 0 92.7
# … with 54 more rows, and 22 more variables: prior_radiation <dbl>,
# prior_chemo <dbl>, prior_transplant <dbl>, recipient_cmv <dbl>,
# donor_cmv <dbl>, donor_sex <dbl>, tnc_dose <dbl>, cd34_dose <dbl>,
# cd3_dose <dbl>, cd8_dose <dbl>, tbi_dose <dbl>, c1_c2 <dbl>, a_ki_rs <dbl>,
# cmv <dbl>, time_to_cmv <dbl>, agvhd <dbl>, time_to_agvhd <dbl>,
# cgvhd <dbl>, time_to_cgvhd <dbl>, fake_dx_date <date>,
# fake_bmt_date <date>, fake_agvhd_date <date>
2 / 3

Example 2/4: Find Bone Marrow Transplants in 2008-2018 in 40-70 yo

# how many rows when you start
nrow(bmt_dates)
bmt_dates %>%
# selected 3 columns
# see how date changes w/filter
select(fake_bmt_date, sex, age)
[1] 64
# A tibble: 64 x 3
fake_bmt_date sex age
<date> <dbl> <dbl>
1 2015-04-09 1 61
2 2021-05-04 1 62
3 2017-10-07 0 63
4 2017-07-23 0 33
5 2015-10-13 0 54
6 2015-01-17 1 55
7 2015-08-20 1 67
8 NA 1 51
9 2018-05-31 0 44
10 2022-06-17 1 59
# … with 54 more rows
2 / 3

Example 2/4: Find Bone Marrow Transplants in 2008-2018 in 40-70 yo

# how many rows when you start
nrow(bmt_dates)
bmt_dates %>%
# selected 3 columns
# see how date changes w/filter
select(fake_bmt_date, sex, age) %>%
filter(between(fake_bmt_date,
ymd("2008-01-01"), ymd("2018-12-31")))
[1] 64
# A tibble: 53 x 3
fake_bmt_date sex age
<date> <dbl> <dbl>
1 2015-04-09 1 61
2 2017-10-07 0 63
3 2017-07-23 0 33
4 2015-10-13 0 54
5 2015-01-17 1 55
6 2015-08-20 1 67
7 2018-05-31 0 44
8 2018-01-18 1 45
9 2016-04-23 1 57
10 2015-03-21 0 52
# … with 43 more rows
2 / 3

Example 2/4: Find Bone Marrow Transplants in 2008-2018 in 40-70 yo

# how many rows when you start
nrow(bmt_dates)
bmt_dates %>%
# selected 3 columns
# see how date changes w/filter
select(fake_bmt_date, sex, age) %>%
filter(between(fake_bmt_date,
ymd("2008-01-01"), ymd("2018-12-31"))) %>%
filter(between(age, 40, 70))
[1] 64
# A tibble: 44 x 3
fake_bmt_date sex age
<date> <dbl> <dbl>
1 2015-04-09 1 61
2 2017-10-07 0 63
3 2015-10-13 0 54
4 2015-01-17 1 55
5 2015-08-20 1 67
6 2018-05-31 0 44
7 2018-01-18 1 45
8 2016-04-23 1 57
9 2015-03-21 0 52
10 2015-12-21 0 62
# … with 34 more rows
2 / 3

Example 2/4: Find Bone Marrow Transplants in 2008-2018 in 40-70 yo

# how many rows when you start
nrow(bmt_dates)
bmt_dates %>%
# selected 3 columns
# see how date changes w/filter
select(fake_bmt_date, sex, age) %>%
filter(between(fake_bmt_date,
ymd("2008-01-01"), ymd("2018-12-31"))) %>%
filter(between(age, 40, 70)) %>%
arrange(fake_bmt_date) # sort
[1] 64
# A tibble: 44 x 3
fake_bmt_date sex age
<date> <dbl> <dbl>
1 2015-01-17 1 55
2 2015-01-20 1 61
3 2015-02-09 1 57
4 2015-02-23 1 48
5 2015-03-05 0 61
6 2015-03-07 1 49
7 2015-03-20 1 57
8 2015-03-21 0 52
9 2015-03-21 1 62
10 2015-04-09 1 61
# … with 34 more rows
2 / 3

Example 2/4: Find Bone Marrow Transplants in 2008-2018 in 40-70 yo

# how many rows when you start
nrow(bmt_dates)
bmt_dates %>%
# selected 3 columns
# see how date changes w/filter
select(fake_bmt_date, sex, age) %>%
filter(between(fake_bmt_date,
ymd("2008-01-01"), ymd("2018-12-31"))) %>%
filter(between(age, 40, 70)) %>%
arrange(fake_bmt_date) # sort
# see how many rows now
# sequential filters combine with AND
# check dates
# Format:
# filter(between(variable, start, end)) <br> # filter(between(variable, start, end)) <br>
[1] 64
# A tibble: 44 x 3
fake_bmt_date sex age
<date> <dbl> <dbl>
1 2015-01-17 1 55
2 2015-01-20 1 61
3 2015-02-09 1 57
4 2015-02-23 1 48
5 2015-03-05 0 61
6 2015-03-07 1 49
7 2015-03-20 1 57
8 2015-03-21 0 52
9 2015-03-21 1 62
10 2015-04-09 1 61
# … with 34 more rows
2 / 3

Example 2/4: Find Bone Marrow Transplants in 2008-2018 in 40-70 yo

# how many rows when you start
nrow(bmt_dates)
bmt_dates %>%
# selected 3 columns
# see how date changes w/filter
select(fake_bmt_date, sex, age) %>%
filter(between(fake_bmt_date,
ymd("2008-01-01"), ymd("2018-12-31"))) %>%
filter(between(age, 40, 70)) %>%
arrange(fake_bmt_date) # sort
# see how many rows now
# sequential filters combine with AND
# check dates
# Format:
# filter(between(variable, start, end)) <br> # filter(between(variable, start, end)) <br>
[1] 64
# A tibble: 44 x 3
fake_bmt_date sex age
<date> <dbl> <dbl>
1 2015-01-17 1 55
2 2015-01-20 1 61
3 2015-02-09 1 57
4 2015-02-23 1 48
5 2015-03-05 0 61
6 2015-03-07 1 49
7 2015-03-20 1 57
8 2015-03-21 0 52
9 2015-03-21 1 62
10 2015-04-09 1 61
# … with 34 more rows
2 / 3

Example 3/4: Find Acute GVHD Cases in last 24 Months

# how many rows when you start
nrow(bmt_dates)
[1] 64
2 / 3

Example 3/4: Find Acute GVHD Cases in last 24 Months

# how many rows when you start
nrow(bmt_dates)
bmt_dates
[1] 64
# A tibble: 64 x 29
id age sex race diagnosis diagnosis_type time_to_transpl…
<dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
1 1 61 1 0 acute my… 1 5.16
2 2 62 1 1 non-Hodg… 0 79.0
3 3 63 0 1 non-Hodg… 0 35.6
4 4 33 0 1 Hodgkin … 0 33.0
5 5 54 0 1 acute ly… 0 11.4
6 6 55 1 1 myelofib… 1 2.43
7 7 67 1 1 acute my… 1 9.59
8 8 51 1 1 acute my… 1 NA
9 9 44 0 0 multiple… 0 43.4
10 10 59 1 1 chronic … 0 92.7
# … with 54 more rows, and 22 more variables: prior_radiation <dbl>,
# prior_chemo <dbl>, prior_transplant <dbl>, recipient_cmv <dbl>,
# donor_cmv <dbl>, donor_sex <dbl>, tnc_dose <dbl>, cd34_dose <dbl>,
# cd3_dose <dbl>, cd8_dose <dbl>, tbi_dose <dbl>, c1_c2 <dbl>, a_ki_rs <dbl>,
# cmv <dbl>, time_to_cmv <dbl>, agvhd <dbl>, time_to_agvhd <dbl>,
# cgvhd <dbl>, time_to_cgvhd <dbl>, fake_dx_date <date>,
# fake_bmt_date <date>, fake_agvhd_date <date>
2 / 3

Example 3/4: Find Acute GVHD Cases in last 24 Months

# how many rows when you start
nrow(bmt_dates)
bmt_dates %>%
# selected 3 columns
# see how date changes w/filter
select(fake_agvhd_date, sex, age)
[1] 64
# A tibble: 64 x 3
fake_agvhd_date sex age
<date> <dbl> <dbl>
1 2015-07-24 1 61
2 2026-09-09 1 62
3 2018-01-27 0 63
4 2019-11-26 0 33
5 2016-01-05 0 54
6 2015-05-13 1 55
7 2015-11-18 1 67
8 NA 1 51
9 2018-06-21 0 44
10 2024-04-09 1 59
# … with 54 more rows
2 / 3

Example 3/4: Find Acute GVHD Cases in last 24 Months

# how many rows when you start
nrow(bmt_dates)
bmt_dates %>%
# selected 3 columns
# see how date changes w/filter
select(fake_agvhd_date, sex, age) %>%
filter(fake_agvhd_date >
today() - months(24))
[1] 64
# A tibble: 16 x 3
fake_agvhd_date sex age
<date> <dbl> <dbl>
1 2026-09-09 1 62
2 2019-11-26 0 33
3 2024-04-09 1 59
4 2034-07-21 0 61
5 2021-02-26 1 62
6 2024-05-21 0 51
7 2022-08-19 0 62
8 2021-09-22 0 52
9 2027-08-11 0 48
10 2020-04-26 1 46
11 2023-02-08 1 51
12 2022-03-19 1 56
13 2019-03-31 1 62
14 2020-09-26 1 58
15 2029-01-28 0 41
16 2029-05-25 1 50
2 / 3

Example 3/4: Find Acute GVHD Cases in last 24 Months

# how many rows when you start
nrow(bmt_dates)
bmt_dates %>%
# selected 3 columns
# see how date changes w/filter
select(fake_agvhd_date, sex, age) %>%
filter(fake_agvhd_date >
today() - months(24)) %>%
arrange(fake_agvhd_date) # sort
[1] 64
# A tibble: 16 x 3
fake_agvhd_date sex age
<date> <dbl> <dbl>
1 2019-03-31 1 62
2 2019-11-26 0 33
3 2020-04-26 1 46
4 2020-09-26 1 58
5 2021-02-26 1 62
6 2021-09-22 0 52
7 2022-03-19 1 56
8 2022-08-19 0 62
9 2023-02-08 1 51
10 2024-04-09 1 59
11 2024-05-21 0 51
12 2026-09-09 1 62
13 2027-08-11 0 48
14 2029-01-28 0 41
15 2029-05-25 1 50
16 2034-07-21 0 61
2 / 3

Example 3/4: Find Acute GVHD Cases in last 24 Months

# how many rows when you start
nrow(bmt_dates)
bmt_dates %>%
# selected 3 columns
# see how date changes w/filter
select(fake_agvhd_date, sex, age) %>%
filter(fake_agvhd_date >
today() - months(24)) %>%
arrange(fake_agvhd_date) # sort
# see how many rows now
# note we have some "future" fake dates
# Format:
# filter(between(variable, start, end)) <br> # filter(between(variable, start, end)) <br>
[1] 64
# A tibble: 16 x 3
fake_agvhd_date sex age
<date> <dbl> <dbl>
1 2019-03-31 1 62
2 2019-11-26 0 33
3 2020-04-26 1 46
4 2020-09-26 1 58
5 2021-02-26 1 62
6 2021-09-22 0 52
7 2022-03-19 1 56
8 2022-08-19 0 62
9 2023-02-08 1 51
10 2024-04-09 1 59
11 2024-05-21 0 51
12 2026-09-09 1 62
13 2027-08-11 0 48
14 2029-01-28 0 41
15 2029-05-25 1 50
16 2034-07-21 0 61
2 / 3

Example 3/4: Find Acute GVHD Cases in last 24 Months

# how many rows when you start
nrow(bmt_dates)
bmt_dates %>%
# selected 3 columns
# see how date changes w/filter
select(fake_agvhd_date, sex, age) %>%
filter(fake_agvhd_date >
today() - months(24)) %>%
arrange(fake_agvhd_date) # sort
# see how many rows now
# note we have some "future" fake dates
# Format:
# filter(between(variable, start, end)) <br> # filter(between(variable, start, end)) <br>
[1] 64
# A tibble: 16 x 3
fake_agvhd_date sex age
<date> <dbl> <dbl>
1 2019-03-31 1 62
2 2019-11-26 0 33
3 2020-04-26 1 46
4 2020-09-26 1 58
5 2021-02-26 1 62
6 2021-09-22 0 52
7 2022-03-19 1 56
8 2022-08-19 0 62
9 2023-02-08 1 51
10 2024-04-09 1 59
11 2024-05-21 0 51
12 2026-09-09 1 62
13 2027-08-11 0 48
14 2029-01-28 0 41
15 2029-05-25 1 50
16 2034-07-21 0 61
2 / 3

Example 4/4: Find BMTs on (Fake) Weekends

# how many rows when you start
nrow(bmt_dates)
[1] 64
2 / 3

Example 4/4: Find BMTs on (Fake) Weekends

# how many rows when you start
nrow(bmt_dates)
bmt_dates
[1] 64
# A tibble: 64 x 29
id age sex race diagnosis diagnosis_type time_to_transpl…
<dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
1 1 61 1 0 acute my… 1 5.16
2 2 62 1 1 non-Hodg… 0 79.0
3 3 63 0 1 non-Hodg… 0 35.6
4 4 33 0 1 Hodgkin … 0 33.0
5 5 54 0 1 acute ly… 0 11.4
6 6 55 1 1 myelofib… 1 2.43
7 7 67 1 1 acute my… 1 9.59
8 8 51 1 1 acute my… 1 NA
9 9 44 0 0 multiple… 0 43.4
10 10 59 1 1 chronic … 0 92.7
# … with 54 more rows, and 22 more variables: prior_radiation <dbl>,
# prior_chemo <dbl>, prior_transplant <dbl>, recipient_cmv <dbl>,
# donor_cmv <dbl>, donor_sex <dbl>, tnc_dose <dbl>, cd34_dose <dbl>,
# cd3_dose <dbl>, cd8_dose <dbl>, tbi_dose <dbl>, c1_c2 <dbl>, a_ki_rs <dbl>,
# cmv <dbl>, time_to_cmv <dbl>, agvhd <dbl>, time_to_agvhd <dbl>,
# cgvhd <dbl>, time_to_cgvhd <dbl>, fake_dx_date <date>,
# fake_bmt_date <date>, fake_agvhd_date <date>
2 / 3

Example 4/4: Find BMTs on (Fake) Weekends

# how many rows when you start
nrow(bmt_dates)
bmt_dates %>%
# selected 3 columns
# see how date changes w/filter
select(fake_bmt_date, sex, age)
[1] 64
# A tibble: 64 x 3
fake_bmt_date sex age
<date> <dbl> <dbl>
1 2015-04-09 1 61
2 2021-05-04 1 62
3 2017-10-07 0 63
4 2017-07-23 0 33
5 2015-10-13 0 54
6 2015-01-17 1 55
7 2015-08-20 1 67
8 NA 1 51
9 2018-05-31 0 44
10 2022-06-17 1 59
# … with 54 more rows
2 / 3

Example 4/4: Find BMTs on (Fake) Weekends

# how many rows when you start
nrow(bmt_dates)
bmt_dates %>%
# selected 3 columns
# see how date changes w/filter
select(fake_bmt_date, sex, age) %>%
# wday gives weekday 1-7, labeled
# note one NA date
mutate(weekday = wday(fake_bmt_date,
label=TRUE))
[1] 64
# A tibble: 64 x 4
fake_bmt_date sex age weekday
<date> <dbl> <dbl> <ord>
1 2015-04-09 1 61 Thu
2 2021-05-04 1 62 Tue
3 2017-10-07 0 63 Sat
4 2017-07-23 0 33 Sun
5 2015-10-13 0 54 Tue
6 2015-01-17 1 55 Sat
7 2015-08-20 1 67 Thu
8 NA 1 51 <NA>
9 2018-05-31 0 44 Thu
10 2022-06-17 1 59 Fri
# … with 54 more rows
2 / 3

Example 4/4: Find BMTs on (Fake) Weekends

# how many rows when you start
nrow(bmt_dates)
bmt_dates %>%
# selected 3 columns
# see how date changes w/filter
select(fake_bmt_date, sex, age) %>%
# wday gives weekday 1-7, labeled
# note one NA date
mutate(weekday = wday(fake_bmt_date,
label=TRUE)) %>%
# %in% must be in a vector of values
filter(wday(fake_bmt_date) %in% c(1,7))
[1] 64
# A tibble: 17 x 4
fake_bmt_date sex age weekday
<date> <dbl> <dbl> <ord>
1 2017-10-07 0 63 Sat
2 2017-07-23 0 33 Sun
3 2015-01-17 1 55 Sat
4 2016-04-23 1 57 Sat
5 2015-03-21 0 52 Sat
6 2016-08-06 0 38 Sat
7 2015-03-21 1 62 Sat
8 2015-04-18 0 52 Sat
9 2015-08-22 0 45 Sat
10 2015-05-31 1 48 Sun
11 2026-02-01 0 48 Sun
12 2015-03-07 1 49 Sat
13 2016-05-22 1 58 Sun
14 2015-08-30 0 39 Sun
15 2022-02-13 1 56 Sun
16 2015-06-28 1 62 Sun
17 2016-05-08 1 54 Sun
2 / 3

Example 4/4: Find BMTs on (Fake) Weekends

# how many rows when you start
nrow(bmt_dates)
bmt_dates %>%
# selected 3 columns
# see how date changes w/filter
select(fake_bmt_date, sex, age) %>%
# wday gives weekday 1-7, labeled
# note one NA date
mutate(weekday = wday(fake_bmt_date,
label=TRUE)) %>%
# %in% must be in a vector of values
filter(wday(fake_bmt_date) %in% c(1,7)) %>%
arrange(fake_bmt_date) # sort
[1] 64
# A tibble: 17 x 4
fake_bmt_date sex age weekday
<date> <dbl> <dbl> <ord>
1 2015-01-17 1 55 Sat
2 2015-03-07 1 49 Sat
3 2015-03-21 0 52 Sat
4 2015-03-21 1 62 Sat
5 2015-04-18 0 52 Sat
6 2015-05-31 1 48 Sun
7 2015-06-28 1 62 Sun
8 2015-08-22 0 45 Sat
9 2015-08-30 0 39 Sun
10 2016-04-23 1 57 Sat
11 2016-05-08 1 54 Sun
12 2016-05-22 1 58 Sun
13 2016-08-06 0 38 Sat
14 2017-07-23 0 33 Sun
15 2017-10-07 0 63 Sat
16 2022-02-13 1 56 Sun
17 2026-02-01 0 48 Sun
2 / 3

Example 4/4: Find BMTs on (Fake) Weekends

# how many rows when you start
nrow(bmt_dates)
bmt_dates %>%
# selected 3 columns
# see how date changes w/filter
select(fake_bmt_date, sex, age) %>%
# wday gives weekday 1-7, labeled
# note one NA date
mutate(weekday = wday(fake_bmt_date,
label=TRUE)) %>%
# %in% must be in a vector of values
filter(wday(fake_bmt_date) %in% c(1,7)) %>%
arrange(fake_bmt_date) # sort
# see how many rows now
# lots of handy functions in lubridate
[1] 64
# A tibble: 17 x 4
fake_bmt_date sex age weekday
<date> <dbl> <dbl> <ord>
1 2015-01-17 1 55 Sat
2 2015-03-07 1 49 Sat
3 2015-03-21 0 52 Sat
4 2015-03-21 1 62 Sat
5 2015-04-18 0 52 Sat
6 2015-05-31 1 48 Sun
7 2015-06-28 1 62 Sun
8 2015-08-22 0 45 Sat
9 2015-08-30 0 39 Sun
10 2016-04-23 1 57 Sat
11 2016-05-08 1 54 Sun
12 2016-05-22 1 58 Sun
13 2016-08-06 0 38 Sat
14 2017-07-23 0 33 Sun
15 2017-10-07 0 63 Sat
16 2022-02-13 1 56 Sun
17 2026-02-01 0 48 Sun
2 / 3

End of This Flipbook

On to The Coding Exercises!

3 / 3

How to Use the filter() function for Rows by Date

Format:

dataset %>% filter(between(variable, start, end))

Note that between is nested within filter.

Start and end dates usually use the {lubridate} package, and the format
ymd("2015-12-31")

Remember that filteR is for selecting Rows because it ends with an R.

Let's step through some filter Examples!

2 / 3
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow