Getting Started

First, set up your working environment by loading these 4 packages using the library() function. Note that you can copy any of the code blocks below by hovering over the top right corner, clicking on the copy icon that appears, and them paste these into your local version of R to run them.

Counting Data in Tables

It is common to count data, particularly in categories, to summarize characteristics or outcomes. The tabyl function in the {janitor} package is helpful for this.

In this vignette, we will look at how to use this function to make simple tables of counts of your data.

First, we will read in the data from strep_tb and indo_rct to use for our tables. Load the libraries in the setup chunk above. If you then run the code chunk below, you should have two new data objects in your Environment tab.

strep_tb <- medicaldata::strep_tb
indo_rct <- medicaldata::indo_rct

Table 2 from Streptomycin for Tuberculosis

We will now try to reproduce Table 2 from the Streptomycin for Tuberculosis manuscript, which can be found here on page 771. These are the primary endpoint results, summarized in a 6 rows x 2 columns table (with an added totals row). Let’s walk through how to do this with the tabyl() function in the {janitor} package.

First Table

The tabyl() function allows you to pipe data into tables, and add ‘adornments’ like total rows and percentages. Let’s start with a basic one-variable tabyl using this ordinal endpoint. You can pipe the dataset into the tabyl function, with your desired variable (radiologic_6m) as the only argument to the function.

strep_tb %>% 
  tabyl(radiologic_6m)
#>                 radiologic_6m  n    percent
#>    6_Considerable_improvement 32 0.29906542
#>        5_Moderate_improvement 23 0.21495327
#>                   4_No_change  5 0.04672897
#>      3_Moderate_deterioration 17 0.15887850
#>  2_Considerable_deterioration 12 0.11214953
#>                       1_Death 18 0.16822430

This gives us the n and proportion of each level of the primary outcome.

Two Variable Table

If we add the treatment arm variable as a 2nd argument, we can come closer to the original table. Note that the levels of the first argument make up the rows of the table, and that the levels of the 2nd argument make up the columns (standard R x C order). Also note that with 2 variables, we get the counts by default, but not proportions of each level, as you might want proportions that are column-wise, or row-wise.

strep_tb %>% 
  tabyl(radiologic_6m, arm)
#>                 radiologic_6m Streptomycin Control
#>    6_Considerable_improvement           28       4
#>        5_Moderate_improvement           10      13
#>                   4_No_change            2       3
#>      3_Moderate_deterioration            5      12
#>  2_Considerable_deterioration            6       6
#>                       1_Death            4      14

Add A Total Row

This is closer, but lacks the total row, and the percentages. In order to have numbers to calculate totals, we have to start with the total row first. We will ‘adorn’ the table with a totals row. We have to specify that we want an additional row of totals at the bottom (not a column of row-wise totals in a new column on the right), with the where argument to the adorn_totals function.

strep_tb %>% 
  tabyl(radiologic_6m, arm) %>% 
  adorn_totals(where = "row") # add a total row
#>                 radiologic_6m Streptomycin Control
#>    6_Considerable_improvement           28       4
#>        5_Moderate_improvement           10      13
#>                   4_No_change            2       3
#>      3_Moderate_deterioration            5      12
#>  2_Considerable_deterioration            6       6
#>                       1_Death            4      14
#>                         Total           55      52

Add Percentages and formatting

This is closer. Now we need to add the percentages, and percentage formatting. We need to specify that we want column-wise, rather than row-wise percentages. We then have to adorn_ns to add the counts, so that we have both counts and percentages. We can specify that we want the counts to be listed first, with the argument position = "front" in the adorn_ns function.

strep_tb %>% 
  tabyl(radiologic_6m, arm) %>% #2 dimensional table, RxC
  adorn_totals(where = "row") %>% # add totals row
  adorn_percentages("col") %>%  # column-wise percentages
  adorn_pct_formatting() %>% 
  adorn_ns(position = "front")  # put n first
#>                 radiologic_6m Streptomycin     Control
#>    6_Considerable_improvement  28  (50.9%)  4   (7.7%)
#>        5_Moderate_improvement  10  (18.2%) 13  (25.0%)
#>                   4_No_change   2   (3.6%)  3   (5.8%)
#>      3_Moderate_deterioration   5   (9.1%) 12  (23.1%)
#>  2_Considerable_deterioration   6  (10.9%)  6  (11.5%)
#>                       1_Death   4   (7.3%) 14  (26.9%)
#>                         Total  55 (100.0%) 52 (100.0%)

Making it Pretty

You can pipe this table into a flextable() object, which makes it easy to add fancy formatting. There are many formatting options in the flextable package, which you can learn about here. You can control column width, fonts, colors, and much more once you are in flextable format. Flextables can be output to MS Word, powerpoint, HTML, and PDF, through Rmarkdown.

strep_tb %>% 
  tabyl(radiologic_6m, arm) %>% 
  adorn_totals(where = "row") %>% 
  adorn_percentages("col") %>%  # column-wise percentages
  adorn_pct_formatting() %>% 
  adorn_ns(position = "front") %>%   # put n first
  flextable::flextable()

Try this Yourself

Now try this yourself, but instead of using the ordinal radiologic_6m outcome, use the improved dichotomous outcome in its place. Copy the code block below and add the piping and additional lines to produce a 2 x 2 table of outcomes. You can websearch for janitor tabyl adorn_title to learn how to add a title to your table.

strep_tb
#> # A tibble: 107 × 13
#>    patient_id arm     dose_strep_g dose_PAS_g gender baseline_condition
#>    <chr>      <fct>          <dbl>      <dbl> <fct>  <fct>             
#>  1 0001       Control            0          0 M      1_Good            
#>  2 0002       Control            0          0 F      1_Good            
#>  3 0003       Control            0          0 F      1_Good            
#>  4 0004       Control            0          0 M      1_Good            
#>  5 0005       Control            0          0 F      1_Good            
#>  6 0006       Control            0          0 M      1_Good            
#>  7 0007       Control            0          0 F      1_Good            
#>  8 0008       Control            0          0 M      1_Good            
#>  9 0009       Control            0          0 F      2_Fair            
#> 10 0010       Control            0          0 M      2_Fair            
#> # … with 97 more rows, and 7 more variables: baseline_temp <fct>,
#> #   baseline_esr <fct>, baseline_cavitation <fct>, strep_resistance <fct>,
#> #   radiologic_6m <fct>, rad_num <dbl>, improved <lgl>

Challenge

Now try to do this with the indo_rct dataset, using the treatment variable group and the outcome variable of outcome. Add a total row, percentages, and a title.

indo_rct 
#> # A tibble: 602 × 33
#>       id site    age  risk gender   outcome sod   pep   recpanc psphinc precut
#>    <dbl> <fct> <dbl> <dbl> <fct>    <fct>   <fct> <fct> <fct>   <fct>   <fct> 
#>  1  1001 1_UM     26   2   1_female 1_yes   1_yes 0_no  1_yes   0_no    0_no  
#>  2  1002 1_UM     24   1   2_male   0_no    0_no  1_yes 0_no    0_no    0_no  
#>  3  1003 1_UM     57   1   1_female 0_no    1_yes 0_no  0_no    0_no    0_no  
#>  4  1004 1_UM     29   2   1_female 1_yes   1_yes 0_no  0_no    0_no    0_no  
#>  5  1005 1_UM     38   3.5 1_female 0_no    1_yes 1_yes 0_no    1_yes   0_no  
#>  6  1006 1_UM     59   3   1_female 0_no    1_yes 0_no  0_no    0_no    1_yes 
#>  7  1007 1_UM     60   1.5 1_female 0_no    0_no  0_no  1_yes   0_no    0_no  
#>  8  1008 1_UM     29   1   2_male   0_no    0_no  0_no  0_no    0_no    0_no  
#>  9  1009 1_UM     53   2   2_male   0_no    0_no  0_no  1_yes   0_no    0_no  
#> 10  1010 1_UM     20   2   2_male   0_no    0_no  0_no  0_no    0_no    1_yes 
#> # … with 592 more rows, and 22 more variables: difcan <fct>, pneudil <fct>,
#> #   amp <fct>, paninj <fct>, acinar <fct>, brush <fct>, asa81 <fct>,
#> #   asa325 <fct>, asa <fct>, prophystent <fct>, therastent <fct>,
#> #   pdstent <fct>, sodsom <fct>, bsphinc <fct>, bstent <fct>, chole <fct>,
#> #   pbmal <fct>, train <fct>, status <fct>, type <fct>, rx <fct>, bleed <dbl>