CohortConstructor

An R package to build and curate cohorts in the OMOP Common Data Model

Introduction

CohortConstructor package is designed to support cohort building pipelines in R.

The code is publicly available in OHDSI’s GitHub repository CohortConstructor.

CohortConstructor v0.3.5 is available in CRAN.

Vignettes with further information can be found in the package website.

More information and context can be found in the online book “Tidy R programming with databases: applications with the OMOP common data model”.

CohortConstructor pipeline

1) Create base cohorts: Cohorts defined using clinical concepts (e.g., asthma diagnoses) or demographics (e.g., females aged >18)

2) Cohort-curation: Tranform base cohorts to meet study-specific inclusion criteria.

Current approach

CohortConstructor approach

Function sets

Base cohorts Cohort construction based on clinical concepts or demographics.

Requirements and Filtering Demographic restrictions, event presence/absence conditions, and filtering specific records.

Time Manipulation Adjusting entry and exit dates to align with study periods, observation windows, or key events.

Transformation and Combination Merging, stratifying, collapsing, matching, or intersecting cohorts.

Base cohorts

Functions to build base cohorts

demographicsCohort()

conceptCohort()

measurementCohort()

deathCohort()

Demographics based - Example

cdm$age_cohort <- demographicsCohort(cdm = cdm, 
                                     ageRange = c(18, 60), 
                                     sex = "Female",
                                     minPriorObservation = 365,
                                     name = "age_cohort")

cdm$age_cohort

# Source:   table<my_study_age_cohort> [?? x 4]
# Database: DuckDB v1.0.0 [nmercade@Windows 10 x64:R 4.2.2/C:\Users\nmercade\AppData\Local\Temp\RtmpA1VZ7T\fileaca4760811d8.duckdb]
   cohort_definition_id subject_id cohort_start_date cohort_end_date
                  <int>      <int> <date>            <date>         
 1                    1         16 1989-10-13        2017-11-02     
 2                    1         12 1981-01-30        2019-03-06     
 3                    1         17 1968-12-11        2011-12-10     
 4                    1        111 1993-05-02        2019-05-17     
 5                    1         82 1979-03-21        2019-06-25     
 6                    1        119 1973-12-27        2016-12-26     
 7                    1        156 1996-10-22        2018-11-04     
 8                    1        180 1995-04-21        2019-05-02     
 9                    1        181 1991-09-16        2017-10-01     
10                    1        250 1992-02-13        2018-06-13     
# ℹ more rows

Demographics based - Example

settings(cdm$age_cohort)

# A tibble: 1 × 5
  cohort_definition_id cohort_name  age_range sex    min_prior_observation
                 <int> <chr>        <chr>     <chr>                  <dbl>
1                    1 demographics 18_60     Female                   365

cohortCount(cdm$age_cohort)

# A tibble: 1 × 3
  cohort_definition_id number_records number_subjects
                 <int>          <int>           <int>
1                    1           1373            1373

attrition(cdm$age_cohort)

# A tibble: 4 × 7
  cohort_definition_id number_records number_subjects reason_id reason                          excluded_records excluded_subjects
                 <int>          <int>           <int>     <int> <chr>                                      <int>             <int>
1                    1           2694            2694         1 Initial qualifying events                      0                 0
2                    1           1373            1373         2 Sex requirement: Female                     1321              1321
3                    1           1373            1373         3 Age requirement: 18 to 60                      0                 0
4                    1           1373            1373         4 Prior observation requirement:…                0                 0

Concept based

Base cohorts are built by domain rather than by cohort definition.

This approach reduces the joins to OMOP CDM tables by using all the concept sets together, making it less computationally expensive.

Workflow to build 5 base cohorts: asthma, COPD, diabetes, acetaminophen and warfarin.

Concept based - Example

Get relevant codelists with CodelistGenerator

drug_codes <- getDrugIngredientCodes(cdm, 
                                     name = c("diclofenac", "acetaminophen"))
drug_codes


- 161_acetaminophen (7 codes)
- 3355_diclofenac (1 codes)

Create concept based cohorts

cdm$medications <- conceptCohort(cdm = cdm, 
                                 conceptSet = drug_codes, 
                                 name = "medications")
settings(cdm$medications)

# A tibble: 2 × 4
  cohort_definition_id cohort_name       cdm_version vocabulary_version
                 <int> <chr>             <chr>       <chr>             
1                    1 161_acetaminophen 5.3         v5.0 18-JAN-19    
2                    2 3355_diclofenac   5.3         v5.0 18-JAN-19

Concept based - Example

Cohort codelist as an attribute

attr(cdm$medications, "cohort_codelist")

# Source:   table<my_study_medications_codelist> [?? x 4]
# Database: DuckDB v1.0.0 [nmercade@Windows 10 x64:R 4.2.2/C:\Users\nmercade\AppData\Local\Temp\RtmpA1VZ7T\fileaca4760811d8.duckdb]
  cohort_definition_id codelist_name     concept_id type       
                 <int> <chr>                  <int> <chr>      
1                    1 161_acetaminophen    1125315 index event
2                    1 161_acetaminophen    1127078 index event
3                    1 161_acetaminophen    1127433 index event
4                    1 161_acetaminophen   40229134 index event
5                    1 161_acetaminophen   40231925 index event
6                    1 161_acetaminophen   40162522 index event
7                    1 161_acetaminophen   19133768 index event
8                    2 3355_diclofenac      1124300 index event

Requirements and Filtering

Functions to apply requirements and filter

On demographics
- requireDemographics()
- requireAge()
- requireSex()
- requirePriorObservation()
- requireFutureObservation()

On cohort entries
- requireIsFirstEntry()
- requireIsLastEntry()
- requireIsEntry()

Require presence or absence based on other cohorts, concepts, and tables
- requireCohortIntersect()
- requireConceptIntersect()
- requireTableIntersect()
- requireDeathFlag()

Other
- requireInDateRange()
- requireMinCohortCount()

Requirement functions - Example

We can apply different inclusion and exclusion criteria using CohortConstructor’s functions in a pipe-line fashion. For instance, in what follows we require
- only first record per person
- subjects 18 years old or more at cohort start date
- only females
- at least 30 days of prior observation at cohort start date

cdm$medications <- cdm$medications %>% 
  requireIsFirstEntry() %>% 
  requireDemographics(
    ageRange = list(c(18, 85)),
    sex = "Female", 
    minPriorObservation = 30
  )

Time Manipulation

Functions to update cohort start and end dates

Cohort exit
- exitAtObservationEnd()
- exitAtDeath()
- exitAtFirstDate()
- exitAtLastDate()

Cohort entry
- entryAtFirstDate()
- entryAtLastDate()

Trim start and end dates
- trimDemographics()
- trimToDateRange()

Pad start and end dates
- padCohortDate()
- padCohortEnd()
- padCohortStart()

Time Manipulation - Example

We can set the end date to the end of the subject’s observation period

cdm$medications <- cdm$medications %>%
  exitAtObservationEnd()

cdm$medications

# Source:   table<my_study_medications> [?? x 4]
# Database: DuckDB v1.0.0 [nmercade@Windows 10 x64:R 4.2.2/C:\Users\nmercade\AppData\Local\Temp\RtmpA1VZ7T\fileaca4760811d8.duckdb]
   cohort_definition_id subject_id cohort_start_date cohort_end_date
                  <int>      <int> <date>            <date>         
 1                    2       2337 2003-07-26        2018-09-12     
 2                    1       1508 1949-02-23        2019-05-14     
 3                    1       4106 1963-01-21        2018-11-09     
 4                    1       2661 1989-10-31        2018-08-25     
 5                    1       5272 1949-04-29        2018-12-05     
 6                    1       1892 1981-03-28        2018-09-11     
 7                    1       4987 1980-05-25        2019-06-10     
 8                    2       2237 1967-01-16        2019-01-13     
 9                    1        764 1989-01-01        2018-05-29     
10                    2       4546 2010-10-31        2018-09-16     
# ℹ more rows

Time Manipulation - Example

We can also trim start and end dates to match demographic requirements
- i.e. cohort dates can be trimmed so the subject contributes time while he is 20 to 40 years old, and has a prior observation of 365 days

cdm$medications_trimmed <- cdm$medications %>%
  trimDemographics(ageRange = list(c(20, 40)),
                   minPriorObservation = 365,
                   name = "medications_trimmed")

Transformation and Combination

Functions for Cohort Transformation and Combination

Split cohorts
- yearCohorts()
- stratifyCohorts()

Combine cohorts
- unionCohorts()
- intersectCohorts()

Filter cohorts
- subsetCohorts()
- sampleCohorts()

Match cohorts
- matchCohorts()

Concatenate entries
- collapseCohorts()

Copy and rename cohorts
- renameCohort()
- copyCohorts()

Cohort combinations - Example

We can generate a new cohort that contains people who had an exposure to both diclofenac and acetaminophen at the same time using intersectCohorts().

cdm$intersection <- cdm$medications %>% 
  CohortConstructor::intersectCohorts(
  gap = 0,
  name = "intersection"
)

settings(cdm$intersection)

# A tibble: 1 × 5
  cohort_definition_id cohort_name                         gap `161_acetaminophen` `3355_diclofenac`
                 <int> <chr>                             <dbl>               <dbl>             <dbl>
1                    1 161_acetaminophen_3355_diclofenac     0                   1                 1

Cohort combinations - Example

attr(cdm$intersection, "cohort_codelist")

# Source:   table<my_study_intersection_codelist> [?? x 4]
# Database: DuckDB v1.0.0 [nmercade@Windows 10 x64:R 4.2.2/C:\Users\nmercade\AppData\Local\Temp\RtmpA1VZ7T\fileaca4760811d8.duckdb]
  cohort_definition_id codelist_name     concept_id type       
                 <int> <chr>                  <int> <chr>      
1                    1 161_acetaminophen    1125315 index event
2                    1 161_acetaminophen    1127078 index event
3                    1 161_acetaminophen    1127433 index event
4                    1 161_acetaminophen   40229134 index event
5                    1 161_acetaminophen   40231925 index event
6                    1 161_acetaminophen   40162522 index event
7                    1 161_acetaminophen   19133768 index event
8                    1 3355_diclofenac      1124300 index event

Thank you!

Questions?