PatientProfiles

A package to add patient’s characateristics to your cohort

Introduction

  • PatientProfiles is developed by OxInfer group as part of the standard analytic pipeline for the Darwin EU project.

  • This package is currently in CRAN and can be installed easily in R studio.

  • More information about this package can be found here: https://darwin-eu-dev.github.io/PatientProfiles/.

  • The aim of this package is to simplify the code to characterise cohorts by patients’ characteristics and reduce the amount of bespoke code for common analytic task.

PatientsProfiles

It’s a group of functions that add common patients’ characteristics to any OMOP CDM tables containing patients’ level data.

E.g. age, gender, priorhistory and intersection between another cohort.

Function
addSex() addCohortIntersectFlag()
addPriorHistory() addCohortIntersectDays()
addDemographics() addCohortIntersectCount()
addAge() addCohortIntersectDate()
addInObservation()

Cohort for this demostration

For this demo, a cdm table containing 4 different cohorts is created using CAPR with Eunomia data:

  1. first_sprain_of_ankle: first occurrence of sprain of ankle

  2. repetitive_sprain_of_ankle: spran of ankle with a previous sprain of ankle

  3. sinusitis: individuals first recorded condition of sinusitis

  4. chronic sinusitis: individuals first recorded condition of chronic sinusitis

cdm$cohort_interest %>% head(3)
# Source:   SQL [3 x 4]
# Database: DuckDB 0.7.1 [miked@Windows 10 x64:R 4.2.3/C:\Users\miked\AppData\Local\Temp\RtmpYlMMW9/ybidojas]
  cohort_definition_id subject_id cohort_start_date cohort_end_date
                 <int>      <dbl> <date>            <date>         
1                    1          5 1991-12-12        1991-12-12     
2                    1         38 2003-05-20        2003-05-20     
3                    1        222 1973-08-24        1973-08-24     

Question: What is this code trying to achieve?

cdm$cohort_interest %>% left_join(
  cdm$person %>%
    mutate(
      day_of_birth = if_else(is.na(day_of_birth), 1, day_of_birth),
      month_of_birth = if_else(is.na(month_of_birth), 1, month_of_birth)
    ) %>%
    mutate(birth_date = as.Date(paste0(
        as.character(as.integer(year_of_birth)),
        "-",
        as.character(as.integer(month_of_birth)),
        "-",
        as.character(as.integer(day_of_birth)))
    )) %>%
    select("subject_id" = "person_id", "birth_date"),
  by = "subject_id"
) %>%
  mutate(age = !!datediff("birth_date", "cohort_start_date", "year")) %>%
  relocate("age")

addAge()

For example, if we want to filter our cohort to patients with age >50, we can run below.

cdm$cohort_interest %>%
  addAge(cdm) %>%
  relocate("age") %>% filter(age > 50)
# Source:   SQL [?? x 5]
# Database: DuckDB 0.7.1 [miked@Windows 10 x64:R 4.2.3/C:\Users\miked\AppData\Local\Temp\RtmpYlMMW9/ybidojas]
     age cohort_definition_id subject_id cohort_start_date cohort_end_date
   <dbl>                <int>      <dbl> <date>            <date>         
 1    53                    1        236 2004-07-12        2004-07-12     
 2    57                    1        958 2014-07-15        2014-07-15     
 3    66                    1       1314 1996-07-20        1996-07-20     
 4    84                    1       3618 1994-05-20        1994-05-20     
 5    74                    1       4503 2010-09-08        2010-09-08     
 6    72                    1       4531 1987-08-05        1987-08-05     
 7    67                    1       2849 1987-03-12        1987-03-12     
 8    55                    1       2864 1994-07-31        1994-07-31     
 9    73                    1       2986 2018-02-20        2018-02-20     
10    73                    1       3079 2006-09-17        2006-09-17     
# ℹ more rows

addAge()

The age is calculated based on the index date.

cdm$cohort_interest %>%
  addAge(
    cdm = cdm,
    indexDate = "cohort_start_date"
  )
# Source:   table<dbplyr_002> [?? x 5]
# Database: DuckDB 0.7.1 [miked@Windows 10 x64:R 4.2.3/C:\Users\miked\AppData\Local\Temp\RtmpYlMMW9/ybidojas]
   cohort_definition_id subject_id cohort_start_date cohort_end_date   age
                  <int>      <dbl> <date>            <date>          <dbl>
 1                    1          5 1991-12-12        1991-12-12         23
 2                    1         38 2003-05-20        2003-05-20         38
 3                    1        222 1973-08-24        1973-08-24         21
 4                    1        236 2004-07-12        2004-07-12         53
 5                    1        520 1983-08-30        1983-08-30         32
 6                    1        720 1966-06-22        1966-06-22          1
 7                    1        958 2014-07-15        2014-07-15         57
 8                    1        962 1965-09-19        1965-09-19         14
 9                    1       1223 1980-12-15        1980-12-15         29
10                    1       1314 1996-07-20        1996-07-20         66
# ℹ more rows

addAge()

It also has options on missing date of birth (DOB).

ageDefaultMonth: month to assigned if month of DOB is missing

ageDefaultDay: day to assigned if the day of month for DOB is missing

ageImposeMonth: whether to impose month if missing

ageImposeDay: whether to impose the day if missing

cdm$cohort_interest %>%
  addAge(
    cdm = cdm,
    indexDate = "cohort_start_date",
    ageDefaultMonth = 1,
    ageDefaultDay = 1,
    ageImposeMonth = TRUE,
    ageImposeDay = TRUE,
    ageName = "age_imposed"
  )

addAge()

It works with any tables in the CDM, e.g. “condition_occurrence” and “drug_exposure table”.

cdm$condition_occurrence %>%
  addAge(cdm, "condition_start_date") %>%
  relocate("age")
# Source:   SQL [?? x 17]
# Database: DuckDB 0.7.1 [miked@Windows 10 x64:R 4.2.3/C:\Users\miked\AppData\Local\Temp\RtmpYlMMW9/ybidojas]
     age condition_occurrence_id person_id condition_concept_id
   <dbl>                   <dbl>     <dbl>                <dbl>
 1    61                    4483       263              4112343
 2    36                    4657       273               192671
 3     3                    4815       283                28060
 4    18                    5153       304               257012
 5     9                    5513       326                28060
 6    30                    5811       341             40481087
 7     8                    5977       351             40481087
 8    17                    6143       362              4113008
 9     6                    6309       370               372328
10     3                    6641       392               260139
# ℹ more rows
# ℹ 13 more variables: condition_start_date <date>,
#   condition_start_datetime <dttm>, condition_end_date <date>,
#   condition_end_datetime <dttm>, condition_type_concept_id <dbl>,
#   stop_reason <lgl>, provider_id <lgl>, visit_occurrence_id <dbl>,
#   visit_detail_id <dbl>, condition_source_value <chr>,
#   condition_source_concept_id <dbl>, condition_status_source_value <lgl>, …

addAge()

Option to create age groups.

cdm$cohort_interest %>%
  addAge(cdm, ageGroup = list(c(0, 19), c(20, 39), c(40, 59), c(60, 79), c(80, 150)))%>%
  relocate("age", "age_group")
# Source:   SQL [?? x 6]
# Database: DuckDB 0.7.1 [miked@Windows 10 x64:R 4.2.3/C:\Users\miked\AppData\Local\Temp\RtmpYlMMW9/ybidojas]
     age age_group cohort_definition_id subject_id cohort_start_date
   <dbl> <chr>                    <int>      <dbl> <date>           
 1    23 20 to 39                     1          5 1991-12-12       
 2    38 20 to 39                     1         38 2003-05-20       
 3    21 20 to 39                     1        222 1973-08-24       
 4    53 40 to 59                     1        236 2004-07-12       
 5    32 20 to 39                     1        520 1983-08-30       
 6     1 0 to 19                      1        720 1966-06-22       
 7    57 40 to 59                     1        958 2014-07-15       
 8    14 0 to 19                      1        962 1965-09-19       
 9    29 20 to 39                     1       1223 1980-12-15       
10    66 60 to 79                     1       1314 1996-07-20       
# ℹ more rows
# ℹ 1 more variable: cohort_end_date <date>

addAge()

Also option to name your group.

For example, you can name age group 20-29 as “young people”, and 30-150 as “amazing people”.

cdm$cohort_interest %>%
  addAge(
    cdm, 
    ageGroup = list(
      "my_age_group" = list(c(0, 19), "twenties (young people)" = c(20, 29), "amazing people" = c(30, 150)),
      "age_group" = list(c(0, 49), c(50, 150))
    )
  ) %>%
  relocate("age", "age_group", "my_age_group")
# Source:   SQL [?? x 7]
# Database: DuckDB 0.7.1 [miked@Windows 10 x64:R 4.2.3/C:\Users\miked\AppData\Local\Temp\RtmpYlMMW9/ybidojas]
     age age_group my_age_group            cohort_definition_id subject_id
   <dbl> <chr>     <chr>                                  <int>      <dbl>
 1    23 0 to 49   twenties (young people)                    1          5
 2    38 0 to 49   amazing people                             1         38
 3    21 0 to 49   twenties (young people)                    1        222
 4    53 50 to 150 amazing people                             1        236
 5    32 0 to 49   amazing people                             1        520
 6     1 0 to 49   0 to 19                                    1        720
 7    57 50 to 150 amazing people                             1        958
 8    14 0 to 49   0 to 19                                    1        962
 9    29 0 to 49   twenties (young people)                    1       1223
10    66 50 to 150 amazing people                             1       1314
# ℹ more rows
# ℹ 2 more variables: cohort_start_date <date>, cohort_end_date <date>

addPriorHistory()

To add the prior history (in number of days) on the current observation period.

For example, you can filter your cohort with patients who have at least 1 year prior history.

cdm$condition_occurrence %>%
  addPriorHistory(
    cdm = cdm,
    indexDate = "condition_start_date",
    priorHistoryName = "prior_history_start"
  ) %>%
  relocate("prior_history_start") %>% filter(prior_history_start > 365)
# Source:   SQL [?? x 17]
# Database: DuckDB 0.7.1 [miked@Windows 10 x64:R 4.2.3/C:\Users\miked\AppData\Local\Temp\RtmpYlMMW9/ybidojas]
   prior_history_start condition_occurrence_id person_id condition_concept_id
                 <dbl>                   <dbl>     <dbl>                <dbl>
 1               13175                    4657       273               192671
 2                1262                    4815       283                28060
 3                6658                    5153       304               257012
 4                3401                    5513       326                28060
 5               11022                    5811       341             40481087
 6                3009                    5977       351             40481087
 7                6282                    6143       362              4113008
 8                2509                    6309       370               372328
 9                1321                    6641       392               260139
10                4910                    6815       403             40481087
# ℹ more rows
# ℹ 13 more variables: condition_start_date <date>,
#   condition_start_datetime <dttm>, condition_end_date <date>,
#   condition_end_datetime <dttm>, condition_type_concept_id <dbl>,
#   stop_reason <lgl>, provider_id <lgl>, visit_occurrence_id <dbl>,
#   visit_detail_id <dbl>, condition_source_value <chr>,
#   condition_source_concept_id <dbl>, condition_status_source_value <lgl>, …

Functions within PatientProfiles can be used together

cdm$cohort_interest %>%
  addAge(cdm) %>%
  addSex(cdm) %>%
  addPriorHistory(cdm) %>%
    print(width = Inf)
# Source:   table<dbplyr_024> [?? x 7]
# Database: DuckDB 0.7.1 [miked@Windows 10 x64:R 4.2.3/C:\Users\miked\AppData\Local\Temp\RtmpYlMMW9/ybidojas]
   cohort_definition_id subject_id cohort_start_date cohort_end_date   age
                  <int>      <dbl> <date>            <date>          <dbl>
 1                    1          5 1991-12-12        1991-12-12         23
 2                    1         38 2003-05-20        2003-05-20         38
 3                    1        222 1973-08-24        1973-08-24         21
 4                    1        236 2004-07-12        2004-07-12         53
 5                    1        720 1966-06-22        1966-06-22          1
 6                    1        958 2014-07-15        2014-07-15         57
 7                    1       1223 1980-12-15        1980-12-15         29
 8                    1       1314 1996-07-20        1996-07-20         66
 9                    1       1558 1987-02-17        1987-02-17         32
10                    1       1659 1975-01-23        1975-01-23          5
   sex    prior_history
   <chr>          <dbl>
 1 Male            8541
 2 Female         13978
 3 Female          8015
 4 Female         19471
 5 Female           662
 6 Male           21008
 7 Male           10874
 8 Male           24262
 9 Female         11982
10 Male            2138
# ℹ more rows

Add intersect functions

It works out the intersect of two different CDM tables.

For example, you have a cohort with patients who had ankle sprain and a cohort with patients who took pain killer. You might want to analyse ankle sprain patients who took pain killer only.

In PatientProfiles, we have a family of different functions to help you calculate the intersect of two different cohorts.

  • addCohortIntersectFlag()

  • addCohortIntersectCount()

  • addCohortIntersectDate()

  • addCohortIntersectDays()

Medication tables created for demo

A medication table containing cohorts of patients who took “naproxen” or “acetaminophen” created using DrugUtilisation package.

cdm$medications
# Source:   table<main.medications> [?? x 4]
# Database: DuckDB 0.7.1 [miked@Windows 10 x64:R 4.2.3/C:\Users\miked\AppData\Local\Temp\RtmpYlMMW9/ybidojas]
   cohort_definition_id subject_id cohort_start_date cohort_end_date
                  <int>      <dbl> <date>            <date>         
 1                    1       1227 2017-09-11        2017-10-11     
 2                    1       2526 2012-04-19        2012-05-03     
 3                    1       2657 2016-01-16        2016-02-22     
 4                    1       3901 2013-10-04        2013-10-25     
 5                    1       4784 2018-08-28        2018-10-02     
 6                    2          5 1974-05-11        1974-05-18     
 7                    2          5 2003-09-10        2003-09-17     
 8                    2         18 1983-05-31        1983-06-28     
 9                    2         32 1956-03-20        1956-04-10     
10                    2         32 1991-03-22        1991-04-12     
# ℹ more rows

addCohortIntersectFlag()

E.g. you might want an indicator on whether the patients in the cohort of interest took acetaminophen.

cdm$cohort_interest %>%
  addCohortIntersectFlag(
    cdm = cdm, 
    targetCohortTable = "medications",
    targetCohortId = 2,
    indexDate = "cohort_start_date",
    targetStartDate = "cohort_start_date",
    targetEndDate = "cohort_end_date",
    window = list(c(-Inf, -1))
  ) %>%
    print(width = Inf)
# Source:   table<dbplyr_053> [?? x 5]
# Database: DuckDB 0.7.1 [miked@Windows 10 x64:R 4.2.3/C:\Users\miked\AppData\Local\Temp\RtmpYlMMW9/ybidojas]
   cohort_definition_id subject_id cohort_start_date cohort_end_date
                  <int>      <dbl> <date>            <date>         
 1                    1          5 1991-12-12        1991-12-12     
 2                    1         38 2003-05-20        2003-05-20     
 3                    1        222 1973-08-24        1973-08-24     
 4                    1        236 2004-07-12        2004-07-12     
 5                    1        520 1983-08-30        1983-08-30     
 6                    1        720 1966-06-22        1966-06-22     
 7                    1        820 2002-09-23        2002-09-23     
 8                    1        958 2014-07-15        2014-07-15     
 9                    1       1223 1980-12-15        1980-12-15     
10                    1       1314 1996-07-20        1996-07-20     
   acetaminophen_minf_to_m1
                      <dbl>
 1                        1
 2                        1
 3                        1
 4                        1
 5                        1
 6                        0
 7                        1
 8                        1
 9                        1
10                        1
# ℹ more rows

Only start date

E.g. only want to look at the incidence.

cdm$cohort_interest %>%
   addCohortIntersectFlag(
    cdm = cdm, 
    targetCohortTable = "medications",
    targetCohortId = 2,
    indexDate = "cohort_start_date",
    targetStartDate = "cohort_start_date",
    targetEndDate = NULL,
    window = list(c(-365, -1))
  ) %>%
    print(width = Inf)
# Source:   table<dbplyr_059> [?? x 5]
# Database: DuckDB 0.7.1 [miked@Windows 10 x64:R 4.2.3/C:\Users\miked\AppData\Local\Temp\RtmpYlMMW9/ybidojas]
   cohort_definition_id subject_id cohort_start_date cohort_end_date
                  <int>      <dbl> <date>            <date>         
 1                    1          5 1991-12-12        1991-12-12     
 2                    1         38 2003-05-20        2003-05-20     
 3                    1        222 1973-08-24        1973-08-24     
 4                    1        236 2004-07-12        2004-07-12     
 5                    1        520 1983-08-30        1983-08-30     
 6                    1        720 1966-06-22        1966-06-22     
 7                    1        820 2002-09-23        2002-09-23     
 8                    1        958 2014-07-15        2014-07-15     
 9                    1       1223 1980-12-15        1980-12-15     
10                    1       1314 1996-07-20        1996-07-20     
   acetaminophen_m365_to_m1
                      <dbl>
 1                        0
 2                        0
 3                        0
 4                        0
 5                        1
 6                        0
 7                        0
 8                        1
 9                        0
10                        0
# ℹ more rows

Multiple windows

There is option to specify multiple time windows.

cdm$cohort_interest %>%
    addCohortIntersectFlag(cdm = cdm, 
    targetCohortTable = "medications",
    targetCohortId = 1,
    targetEndDate = NULL,
    window = list(c(-Inf, -366), c(-365, -31), c(-30, -1))
  ) %>%
    print(width = Inf)
# Source:   table<dbplyr_071> [?? x 7]
# Database: DuckDB 0.7.1 [miked@Windows 10 x64:R 4.2.3/C:\Users\miked\AppData\Local\Temp\RtmpYlMMW9/ybidojas]
   cohort_definition_id subject_id cohort_start_date cohort_end_date
                  <int>      <dbl> <date>            <date>         
 1                    1        222 1973-08-24        1973-08-24     
 2                    1        236 2004-07-12        2004-07-12     
 3                    1       1223 1980-12-15        1980-12-15     
 4                    1       1430 1991-09-16        1991-09-16     
 5                    1       1659 1975-01-23        1975-01-23     
 6                    1       1778 2003-06-06        2003-06-06     
 7                    1       2270 1980-02-04        1980-02-04     
 8                    1       2680 1985-09-17        1985-09-17     
 9                    1       2703 1974-10-05        1974-10-05     
10                    1       2949 1980-10-01        1980-10-01     
   naproxen_minf_to_m366 naproxen_m30_to_m1 naproxen_m365_to_m31
                   <dbl>              <dbl>                <dbl>
 1                     0                  0                    0
 2                     0                  0                    0
 3                     0                  0                    0
 4                     0                  0                    0
 5                     0                  0                    0
 6                     0                  0                    0
 7                     0                  0                    0
 8                     0                  0                    0
 9                     0                  0                    0
10                     0                  0                    0
# ℹ more rows

Others

All the add cohort intersect functions work in a similar way and they can be used to return the intersect date, days or count.

More information can be found at the website: https://darwin-eu-dev.github.io/PatientProfiles/

End

Thanks for listening!

Thanks everyone who contributed to this package!