CohortConstructor

An R package to build and curate cohorts in the OMOP Common Data Model

Introduction

  • CohortConstructor package is designed to support cohort building pipelines in R.
  • CohortConstructor v0.3.5 is available in CRAN.
  • Vignettes with further information can be found in the package website.

CohortConstructor pipeline

1) Create base cohorts: Cohorts defined using clinical concepts (e.g., asthma diagnoses) or demographics (e.g., females aged >18)

2) Cohort-curation: Tranform base cohorts to meet study-specific inclusion criteria.

Current approach

CohortConstructor approach

Function sets

 

Base cohorts Cohort construction based on clinical concepts or demographics.

 

Requirements and Filtering Demographic restrictions, event presence/absence conditions, and filtering specific records.

 

Time Manipulation Adjusting entry and exit dates to align with study periods, observation windows, or key events.

 

Transformation and Combination Merging, stratifying, collapsing, matching, or intersecting cohorts.

Base cohorts

Functions to build base cohorts

  • demographicsCohort()
  • conceptCohort()
  • measurementCohort()
  • deathCohort()

Demographics based - Example

cdm$age_cohort <- demographicsCohort(cdm = cdm, 
                                     ageRange = c(18, 60), 
                                     sex = "Female",
                                     minPriorObservation = 365,
                                     name = "age_cohort")

cdm$age_cohort 
# Source:   table<my_study_age_cohort> [?? x 4]
# Database: DuckDB v1.0.0 [nmercade@Windows 10 x64:R 4.2.2/C:\Users\nmercade\AppData\Local\Temp\RtmpA1VZ7T\fileaca4760811d8.duckdb]
   cohort_definition_id subject_id cohort_start_date cohort_end_date
                  <int>      <int> <date>            <date>         
 1                    1         16 1989-10-13        2017-11-02     
 2                    1         12 1981-01-30        2019-03-06     
 3                    1         17 1968-12-11        2011-12-10     
 4                    1        111 1993-05-02        2019-05-17     
 5                    1         82 1979-03-21        2019-06-25     
 6                    1        119 1973-12-27        2016-12-26     
 7                    1        156 1996-10-22        2018-11-04     
 8                    1        180 1995-04-21        2019-05-02     
 9                    1        181 1991-09-16        2017-10-01     
10                    1        250 1992-02-13        2018-06-13     
# ℹ more rows

Demographics based - Example

settings(cdm$age_cohort)
# A tibble: 1 × 5
  cohort_definition_id cohort_name  age_range sex    min_prior_observation
                 <int> <chr>        <chr>     <chr>                  <dbl>
1                    1 demographics 18_60     Female                   365
cohortCount(cdm$age_cohort)
# A tibble: 1 × 3
  cohort_definition_id number_records number_subjects
                 <int>          <int>           <int>
1                    1           1373            1373
attrition(cdm$age_cohort)
# A tibble: 4 × 7
  cohort_definition_id number_records number_subjects reason_id reason                          excluded_records excluded_subjects
                 <int>          <int>           <int>     <int> <chr>                                      <int>             <int>
1                    1           2694            2694         1 Initial qualifying events                      0                 0
2                    1           1373            1373         2 Sex requirement: Female                     1321              1321
3                    1           1373            1373         3 Age requirement: 18 to 60                      0                 0
4                    1           1373            1373         4 Prior observation requirement:…                0                 0

Concept based

  • Base cohorts are built by domain rather than by cohort definition.
  • This approach reduces the joins to OMOP CDM tables by using all the concept sets together, making it less computationally expensive.

Workflow to build 5 base cohorts: asthma, COPD, diabetes, acetaminophen and warfarin.

Concept based - Example

  1. Get relevant codelists with CodelistGenerator
drug_codes <- getDrugIngredientCodes(cdm, 
                                     name = c("diclofenac", "acetaminophen"))
drug_codes

- 161_acetaminophen (7 codes)
- 3355_diclofenac (1 codes)
  1. Create concept based cohorts
cdm$medications <- conceptCohort(cdm = cdm, 
                                 conceptSet = drug_codes, 
                                 name = "medications")
settings(cdm$medications)
# A tibble: 2 × 4
  cohort_definition_id cohort_name       cdm_version vocabulary_version
                 <int> <chr>             <chr>       <chr>             
1                    1 161_acetaminophen 5.3         v5.0 18-JAN-19    
2                    2 3355_diclofenac   5.3         v5.0 18-JAN-19    

Concept based - Example

  • Cohort codelist as an attribute
attr(cdm$medications, "cohort_codelist")
# Source:   table<my_study_medications_codelist> [?? x 4]
# Database: DuckDB v1.0.0 [nmercade@Windows 10 x64:R 4.2.2/C:\Users\nmercade\AppData\Local\Temp\RtmpA1VZ7T\fileaca4760811d8.duckdb]
  cohort_definition_id codelist_name     concept_id type       
                 <int> <chr>                  <int> <chr>      
1                    1 161_acetaminophen    1125315 index event
2                    1 161_acetaminophen    1127078 index event
3                    1 161_acetaminophen    1127433 index event
4                    1 161_acetaminophen   40229134 index event
5                    1 161_acetaminophen   40231925 index event
6                    1 161_acetaminophen   40162522 index event
7                    1 161_acetaminophen   19133768 index event
8                    2 3355_diclofenac      1124300 index event

Requirements and Filtering

Functions to apply requirements and filter

  • On demographics

    • requireDemographics()

    • requireAge()

    • requireSex()

    • requirePriorObservation()

    • requireFutureObservation()

  • On cohort entries

    • requireIsFirstEntry()

    • requireIsLastEntry()

    • requireIsEntry()

  • Require presence or absence based on other cohorts, concepts, and tables

    • requireCohortIntersect()

    • requireConceptIntersect()

    • requireTableIntersect()

    • requireDeathFlag()

  • Other

    • requireInDateRange()

    • requireMinCohortCount()

Requirement functions - Example

  • We can apply different inclusion and exclusion criteria using CohortConstructor’s functions in a pipe-line fashion. For instance, in what follows we require

    • only first record per person

    • subjects 18 years old or more at cohort start date

    • only females

    • at least 30 days of prior observation at cohort start date

cdm$medications <- cdm$medications %>% 
  requireIsFirstEntry() %>% 
  requireDemographics(
    ageRange = list(c(18, 85)),
    sex = "Female", 
    minPriorObservation = 30
  )

Time Manipulation

Functions to update cohort start and end dates

  • Cohort exit

    • exitAtObservationEnd()

    • exitAtDeath()

    • exitAtFirstDate()

    • exitAtLastDate()

  • Cohort entry

    • entryAtFirstDate()

    • entryAtLastDate()

  • Trim start and end dates

    • trimDemographics()

    • trimToDateRange()

  • Pad start and end dates

    • padCohortDate()

    • padCohortEnd()

    • padCohortStart()

Time Manipulation - Example

  • We can set the end date to the end of the subject’s observation period
cdm$medications <- cdm$medications %>%
  exitAtObservationEnd()

cdm$medications
# Source:   table<my_study_medications> [?? x 4]
# Database: DuckDB v1.0.0 [nmercade@Windows 10 x64:R 4.2.2/C:\Users\nmercade\AppData\Local\Temp\RtmpA1VZ7T\fileaca4760811d8.duckdb]
   cohort_definition_id subject_id cohort_start_date cohort_end_date
                  <int>      <int> <date>            <date>         
 1                    2       2337 2003-07-26        2018-09-12     
 2                    1       1508 1949-02-23        2019-05-14     
 3                    1       4106 1963-01-21        2018-11-09     
 4                    1       2661 1989-10-31        2018-08-25     
 5                    1       5272 1949-04-29        2018-12-05     
 6                    1       1892 1981-03-28        2018-09-11     
 7                    1       4987 1980-05-25        2019-06-10     
 8                    2       2237 1967-01-16        2019-01-13     
 9                    1        764 1989-01-01        2018-05-29     
10                    2       4546 2010-10-31        2018-09-16     
# ℹ more rows

Time Manipulation - Example

  • We can also trim start and end dates to match demographic requirements

    • i.e. cohort dates can be trimmed so the subject contributes time while he is 20 to 40 years old, and has a prior observation of 365 days
cdm$medications_trimmed <- cdm$medications %>%
  trimDemographics(ageRange = list(c(20, 40)),
                   minPriorObservation = 365,
                   name = "medications_trimmed")

Transformation and Combination

Functions for Cohort Transformation and Combination

  • Split cohorts

    • yearCohorts()

    • stratifyCohorts()

  • Combine cohorts

    • unionCohorts()

    • intersectCohorts()

  • Filter cohorts

    • subsetCohorts()

    • sampleCohorts()

  • Match cohorts

    • matchCohorts()
  • Concatenate entries

    • collapseCohorts()
  • Copy and rename cohorts

    • renameCohort()

    • copyCohorts()

Cohort combinations - Example

  • We can generate a new cohort that contains people who had an exposure to both diclofenac and acetaminophen at the same time using intersectCohorts().
cdm$intersection <- cdm$medications %>% 
  CohortConstructor::intersectCohorts(
  gap = 0,
  name = "intersection"
)

settings(cdm$intersection)
# A tibble: 1 × 5
  cohort_definition_id cohort_name                         gap `161_acetaminophen` `3355_diclofenac`
                 <int> <chr>                             <dbl>               <dbl>             <dbl>
1                    1 161_acetaminophen_3355_diclofenac     0                   1                 1

Cohort combinations - Example

attr(cdm$intersection, "cohort_codelist")
# Source:   table<my_study_intersection_codelist> [?? x 4]
# Database: DuckDB v1.0.0 [nmercade@Windows 10 x64:R 4.2.2/C:\Users\nmercade\AppData\Local\Temp\RtmpA1VZ7T\fileaca4760811d8.duckdb]
  cohort_definition_id codelist_name     concept_id type       
                 <int> <chr>                  <int> <chr>      
1                    1 161_acetaminophen    1125315 index event
2                    1 161_acetaminophen    1127078 index event
3                    1 161_acetaminophen    1127433 index event
4                    1 161_acetaminophen   40229134 index event
5                    1 161_acetaminophen   40231925 index event
6                    1 161_acetaminophen   40162522 index event
7                    1 161_acetaminophen   19133768 index event
8                    1 3355_diclofenac      1124300 index event

Thank you!

Questions?