CohortConstructor

An R package to build and curate cohorts in the OMOP Common Data Model

Introduction

  • CohortConstructor package is designed to support cohort building pipelines in R.
  • The approach taken to create cohorts is to first build a set of base cohorts, and then apply inclusion criteria to derive the final study cohorts of interest.
  • Vignettes with further information can be found in the package website.
  • Available from CRAN.

Understanding cohorts

“A cohort is a set of persons who satisfy one or more inclusion criteria for a duration of time.”

Cohorts in R

A cohort table in R is represented by four fundamental columns:

  • cohort_definition_id: An integer identifying the cohort.

  • subject_id: An identifier for the patients who are part of the cohort.

  • cohort_start_date: The date when the patient begins contributing time to the cohort.

  • cohort_end_date: The date when the patient leaves the cohort.

!! Subjects can contribute multiple times in a cohort, but their contributions cannot overlap!

Cohorts in R

cdm$my_cohort
# Source:   table<my_study_my_cohort> [?? x 4]
# Database: DuckDB v1.1.3 [eburn@Windows 10 x64:R 4.4.0/C:\Users\eburn\AppData\Local\Temp\Rtmp8YeQmj\file48b81e595367.duckdb]
   cohort_definition_id subject_id cohort_start_date cohort_end_date
                  <int>      <int> <date>            <date>         
 1                    1       5174 1944-12-20        1945-02-18     
 2                    2       2485 2004-08-24        2004-09-21     
 3                    1       2445 1965-12-24        1966-01-14     
 4                    1       4283 1956-12-20        1957-03-20     
 5                    2       3708 2003-07-26        2003-09-24     
 6                    1       4101 1987-12-30        1988-01-20     
 7                    1       4890 1927-01-11        1927-02-01     
 8                    1        492 1950-12-17        1950-12-31     
 9                    1       1173 1946-06-03        1946-06-17     
10                    1       1177 1968-10-12        1968-10-26     
# ℹ more rows

Cohort attributes

  • settings: Relates cohort_definition_id with cohort_name, and other variables that define the cohort.

  • attrition: Inclusion logic to create each cohort and the resulting number of records and subjects at each step.

  • cohortCount: Number of records and subjects in each cohort.

  • cohortCodelist: Concepts used to derive the cohort.

Cohort attributes

  • settings
settings(cdm$my_cohort)
# A tibble: 2 × 5
  cohort_definition_id cohort_name    sex    cdm_version vocabulary_version
                 <int> <chr>          <chr>  <chr>       <chr>             
1                    1 1191_aspirin   Female 5.3         v5.0 18-JAN-19    
2                    2 5640_ibuprofen Female 5.3         v5.0 18-JAN-19    
  • attrition
attrition(cdm$my_cohort)
# A tibble: 10 × 7
   cohort_definition_id number_records number_subjects reason_id reason                     excluded_records excluded_subjects
                  <int>          <int>           <int>     <int> <chr>                                 <int>             <int>
 1                    1           4380            1927         1 Initial qualifying events                 0                 0
 2                    1           4380            1927         2 Record start <= record end                0                 0
 3                    1           4380            1927         3 Record in observation                     0                 0
 4                    1           4379            1927         4 Merge overlapping records                 1                 0
 5                    1           2265             980         5 Sex requirement: Female                2114               947
 6                    2           2148            1451         1 Initial qualifying events                 0                 0
 7                    2           2148            1451         2 Record start <= record end                0                 0
 8                    2           2148            1451         3 Record in observation                     0                 0
 9                    2           2148            1451         4 Merge overlapping records                 0                 0
10                    2           1107             741         5 Sex requirement: Female                1041               710

Cohort attributes

  • cohortCount
cohortCount(cdm$my_cohort)
# A tibble: 2 × 3
  cohort_definition_id number_records number_subjects
                 <int>          <int>           <int>
1                    1           2265             980
2                    2           1107             741
  • cohortCodelist
cohortCodelist(cdm$my_cohort, 1)

- 1191_aspirin (2 codes)
cohortCodelist(cdm$my_cohort, 2)

- 5640_ibuprofen (3 codes)

CohortConstructor

Function sets

 

Built base cohorts Cohort construction based on concept sets or demographic requirements on the database population.

 

Applying cohort requirements Impose study specific inclusion and exclusion criteria to cohorts in the database.

 

Update cohort start and end dates Modify start and end dates of subject’s in a cohort.

 

Cohort manipulation Generate new cohorts by manipulating a set of cohorts in the database.

Built base cohorts

Functions to build base cohorts

  • demographicsCohort()
  • conceptCohort()
  • measurementCohort()

Demographic based - Example

cdm$age_cohort <- demographicsCohort(cdm = cdm, 
                                     ageRange = c(18, 65), 
                                     name = "age_cohort")

settings(cdm$age_cohort)
# A tibble: 1 × 3
  cohort_definition_id cohort_name  age_range
                 <int> <chr>        <chr>    
1                    1 demographics 18_65    
cohortCount(cdm$age_cohort)
# A tibble: 1 × 3
  cohort_definition_id number_records number_subjects
                 <int>          <int>           <int>
1                    1           2694            2694
attrition(cdm$age_cohort)
# A tibble: 2 × 7
  cohort_definition_id number_records number_subjects reason_id reason                    excluded_records excluded_subjects
                 <int>          <int>           <int>     <int> <chr>                                <int>             <int>
1                    1           2694            2694         1 Initial qualifying events                0                 0
2                    1           2694            2694         2 Age requirement: 18 to 65                0                 0

Demographic based - Example

# CohortCharacteristics R package
summary(cdm$age_cohort) |> tableCohortAttrition()
Table has no data

Concept based

  • Base cohorts are built by domain rather than by cohort definition.
  • This approach reduces the joins to OMOP CDM tables by using all the concept sets together, making it less computationally expensive.

Workflow to built 5 base cohorts: asthma, COPD, diabetes, acetaminophen and warfarin.

Concept based - Example

  • Get relevant codelists
drug_codes <- getDrugIngredientCodes(cdm, 
                                     name = c("diclofenac", "acetaminophen"))
drug_codes

- 161_acetaminophen (7 codes)
- 3355_diclofenac (1 codes)
  • Create concept based cohorts
cdm$medications <- conceptCohort(cdm = cdm, 
                                 conceptSet = drug_codes, 
                                 name = "medications")
settings(cdm$medications)
# A tibble: 2 × 4
  cohort_definition_id cohort_name       cdm_version vocabulary_version
                 <int> <chr>             <chr>       <chr>             
1                    1 161_acetaminophen 5.3         v5.0 18-JAN-19    
2                    2 3355_diclofenac   5.3         v5.0 18-JAN-19    

Concept based - Example

  • Cohort codelist as an attribute
attr(cdm$medications, "cohort_codelist")
# Source:   table<my_study_medications_codelist> [?? x 4]
# Database: DuckDB v1.1.3 [eburn@Windows 10 x64:R 4.4.0/C:\Users\eburn\AppData\Local\Temp\Rtmp8YeQmj\file48b81e595367.duckdb]
  cohort_definition_id codelist_name     concept_id type       
                 <int> <chr>                  <int> <chr>      
1                    1 161_acetaminophen    1125315 index event
2                    1 161_acetaminophen    1127078 index event
3                    1 161_acetaminophen    1127433 index event
4                    1 161_acetaminophen   40229134 index event
5                    1 161_acetaminophen   40231925 index event
6                    1 161_acetaminophen   40162522 index event
7                    1 161_acetaminophen   19133768 index event
8                    2 3355_diclofenac      1124300 index event

Concept based - Measurement

  • Cohorts can be created from the measurement table with measurementCohort.

  • This is how we can create a cohort of high fever from oral temperature measurements results.

fever_codelist <- list("oral_temperature_measurement" = 3006322)

cdm$temperature <- measurementCohort(
  cdm = cdm,
  conceptSet = fever_codelist,
  name = "temperature",
  valueAsNumber = list("586323" = c(39, 45)) # 586323 -> unit concept for celsius
)

Applying cohort requirements

Functions to apply cohort requirements

  • On demographics

    • requireDemographics()

    • requireAge()

    • requireSex()

    • requirePriorObservation()

    • requireFutureObservation()

  • On cohort entries

    • requireIsFirstEntry()

    • requireIsLastEntry()

  • On cohort dates

    • requireInDateRange()
  • Require presence or absence based on other cohorts, tables and concepts

    • requireCohortIntersect()

    • requireConceptIntersect()

    • requireTableIntersect()

    • requireDeathFlag()

Deriving study cohorts from base cohorts

Current approach

CohortConstructor

Requirement functions - Example

  • We can apply different inclusion and exclusion criteria using CohortConstructor’s functions in a pipe-line fashion. For instance, in what follows we require

    • only first record per person

    • subjects 18 years old or more at cohort start date

    • only females

    • more than 180 days of prior observation at cohort start date

cdm$medications <- cdm$medications %>% 
  requireIsFirstEntry() %>% 
  requireDemographics(
    ageRange = list(c(18, 85)),
    sex = "Female", 
    minPriorObservation = 30
  )

Requirement functions - Example

Diclofenac attrition:

Table has no data

Requirement functions - Example

Acetaminophen attrition:

Table has no data

Requirement functions - Example

  • Require no more than 1 event of GI bleed in the past
cdm$medications_no_gi_bleed <- cdm$medications %>%
  requireConceptIntersect(conceptSet = list("gi_bleed" = 192671), 
                          intersections = c(0, 1),
                          window = c(-Inf, 0), 
                          name = "medications_no_gi_bleed") 

Requirement functions - Example

Diclofenac attrition:

Table has no data

Requirement functions - Example

Acetaminophen attrition:

Table has no data

name argument

  • Purpose: Specifies the name for the new cohort table in the database.
  • Default Behavior: If not provided, the function uses the input cohort’s name.
  • Warning: Omitting the name argument will overwrite the existing cohort table.
# Example: overwrite cohort
cdm$cohort1 <- cdm$cohort1 %>%
  requireDeathFlag()

# Example: create new cohort table
cdm$cohort2 <- cdm$cohort1 %>%
  requireDeathFlag(name = "cohort2")

Update cohort start and end dates

Functions to update cohort start and end dates

  • Cohort exit

    • exitAtObservationEnd()

    • exitAtDeath()

    • exitAtFirstDate()

    • exitAtLastDate()

  • Cohort entry

    • entryAtFirstDate()

    • entryAtLastDate()

  • Trim start and end dates

    • trimDemographics()

    • trimToDateRange()

Update cohort start and end dates - Example

  • We can set the end date to the end of the subject’s observation period
cdm$medications <- cdm$medications %>%
  exitAtObservationEnd()

cdm$medications
# Source:   table<my_study_medications> [?? x 4]
# Database: DuckDB v1.1.3 [eburn@Windows 10 x64:R 4.4.0/C:\Users\eburn\AppData\Local\Temp\Rtmp8YeQmj\file48b81e595367.duckdb]
   cohort_definition_id subject_id cohort_start_date cohort_end_date
                  <int>      <int> <date>            <date>         
 1                    1       1466 2004-12-19        2019-03-12     
 2                    2       3718 2002-03-10        2018-12-20     
 3                    2       2959 2009-12-27        2018-11-16     
 4                    2       5123 2016-08-06        2019-06-04     
 5                    1       1016 2000-12-19        2019-06-26     
 6                    2       4559 1983-05-29        2018-08-15     
 7                    2        300 2009-11-30        2017-07-21     
 8                    2       4822 1987-07-31        2019-06-26     
 9                    1        408 1994-06-16        2018-09-06     
10                    2       4979 2014-08-28        2018-08-04     
# ℹ more rows

Update cohort start and end dates - Example

  • We can also trim start and end dates to match demographic requirements

  • i.e. cohort dates can be trimmed so the subject contributes time while he is 20 to 40 years old, and has a prior observation of 365 days

cdm$medications_trimmed <- cdm$medications %>%
  trimDemographics(ageRange = list(c(20, 40)),
                   minPriorObservation = 365,
                   name = "medications_trimmed")

Update cohort start and end dates - Example

Diclofenac attrition:

Table has no data

Update cohort start and end dates - Example

Acetaminophen attrition:

Table has no data

Cohort manipulation

Functions for cohort manipulations

  • collapseCohorts()
  • intersectCohorts()
  • matchCohorts()
  • stratifyCohorts()
  • subsetCohorts()
  • unionCohorts()
  • yearCohorts()

Questions

Questions?