A package to add patient’s characateristics to your cohort
PatientProfiles is developed by OxInfer group as part of the standard analytic pipeline for the Darwin EU project.
This package is currently in CRAN and can be installed easily in R studio.
More information about this package can be found here: https://darwin-eu-dev.github.io/PatientProfiles/.
The aim of this package is to simplify the code to characterise cohorts by patients’ characteristics and reduce the amount of bespoke code for common analytic task.
It’s a group of functions that add common patients’ characteristics to any OMOP CDM tables containing patients’ level data.
E.g. age, gender, priorhistory and intersection between another cohort.
Function | |
---|---|
addSex() | addCohortIntersectFlag() |
addPriorHistory() | addCohortIntersectDays() |
addDemographics() | addCohortIntersectCount() |
addAge() | addCohortIntersectDate() |
addInObservation() |
For this demo, a cdm table containing 4 different cohorts is created using CAPR with Eunomia data:
first_sprain_of_ankle: first occurrence of sprain of ankle
repetitive_sprain_of_ankle: spran of ankle with a previous sprain of ankle
sinusitis: individuals first recorded condition of sinusitis
chronic sinusitis: individuals first recorded condition of chronic sinusitis
# Source: SQL [3 x 4]
# Database: DuckDB 0.7.1 [miked@Windows 10 x64:R 4.2.3/C:\Users\miked\AppData\Local\Temp\RtmpYlMMW9/ybidojas]
cohort_definition_id subject_id cohort_start_date cohort_end_date
<int> <dbl> <date> <date>
1 1 5 1991-12-12 1991-12-12
2 1 38 2003-05-20 2003-05-20
3 1 222 1973-08-24 1973-08-24
cdm$cohort_interest %>% left_join(
cdm$person %>%
mutate(
day_of_birth = if_else(is.na(day_of_birth), 1, day_of_birth),
month_of_birth = if_else(is.na(month_of_birth), 1, month_of_birth)
) %>%
mutate(birth_date = as.Date(paste0(
as.character(as.integer(year_of_birth)),
"-",
as.character(as.integer(month_of_birth)),
"-",
as.character(as.integer(day_of_birth)))
)) %>%
select("subject_id" = "person_id", "birth_date"),
by = "subject_id"
) %>%
mutate(age = !!datediff("birth_date", "cohort_start_date", "year")) %>%
relocate("age")
For example, if we want to filter our cohort to patients with age >50, we can run below.
# Source: SQL [?? x 5]
# Database: DuckDB 0.7.1 [miked@Windows 10 x64:R 4.2.3/C:\Users\miked\AppData\Local\Temp\RtmpYlMMW9/ybidojas]
age cohort_definition_id subject_id cohort_start_date cohort_end_date
<dbl> <int> <dbl> <date> <date>
1 53 1 236 2004-07-12 2004-07-12
2 57 1 958 2014-07-15 2014-07-15
3 66 1 1314 1996-07-20 1996-07-20
4 84 1 3618 1994-05-20 1994-05-20
5 74 1 4503 2010-09-08 2010-09-08
6 72 1 4531 1987-08-05 1987-08-05
7 67 1 2849 1987-03-12 1987-03-12
8 55 1 2864 1994-07-31 1994-07-31
9 73 1 2986 2018-02-20 2018-02-20
10 73 1 3079 2006-09-17 2006-09-17
# ℹ more rows
The age is calculated based on the index date.
# Source: table<dbplyr_002> [?? x 5]
# Database: DuckDB 0.7.1 [miked@Windows 10 x64:R 4.2.3/C:\Users\miked\AppData\Local\Temp\RtmpYlMMW9/ybidojas]
cohort_definition_id subject_id cohort_start_date cohort_end_date age
<int> <dbl> <date> <date> <dbl>
1 1 5 1991-12-12 1991-12-12 23
2 1 38 2003-05-20 2003-05-20 38
3 1 222 1973-08-24 1973-08-24 21
4 1 236 2004-07-12 2004-07-12 53
5 1 520 1983-08-30 1983-08-30 32
6 1 720 1966-06-22 1966-06-22 1
7 1 958 2014-07-15 2014-07-15 57
8 1 962 1965-09-19 1965-09-19 14
9 1 1223 1980-12-15 1980-12-15 29
10 1 1314 1996-07-20 1996-07-20 66
# ℹ more rows
It also has options on missing date of birth (DOB).
ageDefaultMonth: month to assigned if month of DOB is missing
ageDefaultDay: day to assigned if the day of month for DOB is missing
ageImposeMonth: whether to impose month if missing
ageImposeDay: whether to impose the day if missing
It works with any tables in the CDM, e.g. “condition_occurrence” and “drug_exposure table”.
# Source: SQL [?? x 17]
# Database: DuckDB 0.7.1 [miked@Windows 10 x64:R 4.2.3/C:\Users\miked\AppData\Local\Temp\RtmpYlMMW9/ybidojas]
age condition_occurrence_id person_id condition_concept_id
<dbl> <dbl> <dbl> <dbl>
1 61 4483 263 4112343
2 36 4657 273 192671
3 3 4815 283 28060
4 18 5153 304 257012
5 9 5513 326 28060
6 30 5811 341 40481087
7 8 5977 351 40481087
8 17 6143 362 4113008
9 6 6309 370 372328
10 3 6641 392 260139
# ℹ more rows
# ℹ 13 more variables: condition_start_date <date>,
# condition_start_datetime <dttm>, condition_end_date <date>,
# condition_end_datetime <dttm>, condition_type_concept_id <dbl>,
# stop_reason <lgl>, provider_id <lgl>, visit_occurrence_id <dbl>,
# visit_detail_id <dbl>, condition_source_value <chr>,
# condition_source_concept_id <dbl>, condition_status_source_value <lgl>, …
Option to create age groups.
cdm$cohort_interest %>%
addAge(cdm, ageGroup = list(c(0, 19), c(20, 39), c(40, 59), c(60, 79), c(80, 150)))%>%
relocate("age", "age_group")
# Source: SQL [?? x 6]
# Database: DuckDB 0.7.1 [miked@Windows 10 x64:R 4.2.3/C:\Users\miked\AppData\Local\Temp\RtmpYlMMW9/ybidojas]
age age_group cohort_definition_id subject_id cohort_start_date
<dbl> <chr> <int> <dbl> <date>
1 23 20 to 39 1 5 1991-12-12
2 38 20 to 39 1 38 2003-05-20
3 21 20 to 39 1 222 1973-08-24
4 53 40 to 59 1 236 2004-07-12
5 32 20 to 39 1 520 1983-08-30
6 1 0 to 19 1 720 1966-06-22
7 57 40 to 59 1 958 2014-07-15
8 14 0 to 19 1 962 1965-09-19
9 29 20 to 39 1 1223 1980-12-15
10 66 60 to 79 1 1314 1996-07-20
# ℹ more rows
# ℹ 1 more variable: cohort_end_date <date>
Also option to name your group.
For example, you can name age group 20-29 as “young people”, and 30-150 as “amazing people”.
cdm$cohort_interest %>%
addAge(
cdm,
ageGroup = list(
"my_age_group" = list(c(0, 19), "twenties (young people)" = c(20, 29), "amazing people" = c(30, 150)),
"age_group" = list(c(0, 49), c(50, 150))
)
) %>%
relocate("age", "age_group", "my_age_group")
# Source: SQL [?? x 7]
# Database: DuckDB 0.7.1 [miked@Windows 10 x64:R 4.2.3/C:\Users\miked\AppData\Local\Temp\RtmpYlMMW9/ybidojas]
age age_group my_age_group cohort_definition_id subject_id
<dbl> <chr> <chr> <int> <dbl>
1 23 0 to 49 twenties (young people) 1 5
2 38 0 to 49 amazing people 1 38
3 21 0 to 49 twenties (young people) 1 222
4 53 50 to 150 amazing people 1 236
5 32 0 to 49 amazing people 1 520
6 1 0 to 49 0 to 19 1 720
7 57 50 to 150 amazing people 1 958
8 14 0 to 49 0 to 19 1 962
9 29 0 to 49 twenties (young people) 1 1223
10 66 50 to 150 amazing people 1 1314
# ℹ more rows
# ℹ 2 more variables: cohort_start_date <date>, cohort_end_date <date>
To add the prior history (in number of days) on the current observation period.
For example, you can filter your cohort with patients who have at least 1 year prior history.
cdm$condition_occurrence %>%
addPriorHistory(
cdm = cdm,
indexDate = "condition_start_date",
priorHistoryName = "prior_history_start"
) %>%
relocate("prior_history_start") %>% filter(prior_history_start > 365)
# Source: SQL [?? x 17]
# Database: DuckDB 0.7.1 [miked@Windows 10 x64:R 4.2.3/C:\Users\miked\AppData\Local\Temp\RtmpYlMMW9/ybidojas]
prior_history_start condition_occurrence_id person_id condition_concept_id
<dbl> <dbl> <dbl> <dbl>
1 13175 4657 273 192671
2 1262 4815 283 28060
3 6658 5153 304 257012
4 3401 5513 326 28060
5 11022 5811 341 40481087
6 3009 5977 351 40481087
7 6282 6143 362 4113008
8 2509 6309 370 372328
9 1321 6641 392 260139
10 4910 6815 403 40481087
# ℹ more rows
# ℹ 13 more variables: condition_start_date <date>,
# condition_start_datetime <dttm>, condition_end_date <date>,
# condition_end_datetime <dttm>, condition_type_concept_id <dbl>,
# stop_reason <lgl>, provider_id <lgl>, visit_occurrence_id <dbl>,
# visit_detail_id <dbl>, condition_source_value <chr>,
# condition_source_concept_id <dbl>, condition_status_source_value <lgl>, …
# Source: table<dbplyr_024> [?? x 7]
# Database: DuckDB 0.7.1 [miked@Windows 10 x64:R 4.2.3/C:\Users\miked\AppData\Local\Temp\RtmpYlMMW9/ybidojas]
cohort_definition_id subject_id cohort_start_date cohort_end_date age
<int> <dbl> <date> <date> <dbl>
1 1 5 1991-12-12 1991-12-12 23
2 1 38 2003-05-20 2003-05-20 38
3 1 222 1973-08-24 1973-08-24 21
4 1 236 2004-07-12 2004-07-12 53
5 1 720 1966-06-22 1966-06-22 1
6 1 958 2014-07-15 2014-07-15 57
7 1 1223 1980-12-15 1980-12-15 29
8 1 1314 1996-07-20 1996-07-20 66
9 1 1558 1987-02-17 1987-02-17 32
10 1 1659 1975-01-23 1975-01-23 5
sex prior_history
<chr> <dbl>
1 Male 8541
2 Female 13978
3 Female 8015
4 Female 19471
5 Female 662
6 Male 21008
7 Male 10874
8 Male 24262
9 Female 11982
10 Male 2138
# ℹ more rows
It works out the intersect of two different CDM tables.
For example, you have a cohort with patients who had ankle sprain and a cohort with patients who took pain killer. You might want to analyse ankle sprain patients who took pain killer only.
In PatientProfiles, we have a family of different functions to help you calculate the intersect of two different cohorts.
addCohortIntersectFlag()
addCohortIntersectCount()
addCohortIntersectDate()
addCohortIntersectDays()
A medication table containing cohorts of patients who took “naproxen” or “acetaminophen” created using DrugUtilisation package.
# Source: table<main.medications> [?? x 4]
# Database: DuckDB 0.7.1 [miked@Windows 10 x64:R 4.2.3/C:\Users\miked\AppData\Local\Temp\RtmpYlMMW9/ybidojas]
cohort_definition_id subject_id cohort_start_date cohort_end_date
<int> <dbl> <date> <date>
1 1 1227 2017-09-11 2017-10-11
2 1 2526 2012-04-19 2012-05-03
3 1 2657 2016-01-16 2016-02-22
4 1 3901 2013-10-04 2013-10-25
5 1 4784 2018-08-28 2018-10-02
6 2 5 1974-05-11 1974-05-18
7 2 5 2003-09-10 2003-09-17
8 2 18 1983-05-31 1983-06-28
9 2 32 1956-03-20 1956-04-10
10 2 32 1991-03-22 1991-04-12
# ℹ more rows
E.g. you might want an indicator on whether the patients in the cohort of interest took acetaminophen.
cdm$cohort_interest %>%
addCohortIntersectFlag(
cdm = cdm,
targetCohortTable = "medications",
targetCohortId = 2,
indexDate = "cohort_start_date",
targetStartDate = "cohort_start_date",
targetEndDate = "cohort_end_date",
window = list(c(-Inf, -1))
) %>%
print(width = Inf)
# Source: table<dbplyr_053> [?? x 5]
# Database: DuckDB 0.7.1 [miked@Windows 10 x64:R 4.2.3/C:\Users\miked\AppData\Local\Temp\RtmpYlMMW9/ybidojas]
cohort_definition_id subject_id cohort_start_date cohort_end_date
<int> <dbl> <date> <date>
1 1 5 1991-12-12 1991-12-12
2 1 38 2003-05-20 2003-05-20
3 1 222 1973-08-24 1973-08-24
4 1 236 2004-07-12 2004-07-12
5 1 520 1983-08-30 1983-08-30
6 1 720 1966-06-22 1966-06-22
7 1 820 2002-09-23 2002-09-23
8 1 958 2014-07-15 2014-07-15
9 1 1223 1980-12-15 1980-12-15
10 1 1314 1996-07-20 1996-07-20
acetaminophen_minf_to_m1
<dbl>
1 1
2 1
3 1
4 1
5 1
6 0
7 1
8 1
9 1
10 1
# ℹ more rows
E.g. only want to look at the incidence.
cdm$cohort_interest %>%
addCohortIntersectFlag(
cdm = cdm,
targetCohortTable = "medications",
targetCohortId = 2,
indexDate = "cohort_start_date",
targetStartDate = "cohort_start_date",
targetEndDate = NULL,
window = list(c(-365, -1))
) %>%
print(width = Inf)
# Source: table<dbplyr_059> [?? x 5]
# Database: DuckDB 0.7.1 [miked@Windows 10 x64:R 4.2.3/C:\Users\miked\AppData\Local\Temp\RtmpYlMMW9/ybidojas]
cohort_definition_id subject_id cohort_start_date cohort_end_date
<int> <dbl> <date> <date>
1 1 5 1991-12-12 1991-12-12
2 1 38 2003-05-20 2003-05-20
3 1 222 1973-08-24 1973-08-24
4 1 236 2004-07-12 2004-07-12
5 1 520 1983-08-30 1983-08-30
6 1 720 1966-06-22 1966-06-22
7 1 820 2002-09-23 2002-09-23
8 1 958 2014-07-15 2014-07-15
9 1 1223 1980-12-15 1980-12-15
10 1 1314 1996-07-20 1996-07-20
acetaminophen_m365_to_m1
<dbl>
1 0
2 0
3 0
4 0
5 1
6 0
7 0
8 1
9 0
10 0
# ℹ more rows
There is option to specify multiple time windows.
cdm$cohort_interest %>%
addCohortIntersectFlag(cdm = cdm,
targetCohortTable = "medications",
targetCohortId = 1,
targetEndDate = NULL,
window = list(c(-Inf, -366), c(-365, -31), c(-30, -1))
) %>%
print(width = Inf)
# Source: table<dbplyr_071> [?? x 7]
# Database: DuckDB 0.7.1 [miked@Windows 10 x64:R 4.2.3/C:\Users\miked\AppData\Local\Temp\RtmpYlMMW9/ybidojas]
cohort_definition_id subject_id cohort_start_date cohort_end_date
<int> <dbl> <date> <date>
1 1 222 1973-08-24 1973-08-24
2 1 236 2004-07-12 2004-07-12
3 1 1223 1980-12-15 1980-12-15
4 1 1430 1991-09-16 1991-09-16
5 1 1659 1975-01-23 1975-01-23
6 1 1778 2003-06-06 2003-06-06
7 1 2270 1980-02-04 1980-02-04
8 1 2680 1985-09-17 1985-09-17
9 1 2703 1974-10-05 1974-10-05
10 1 2949 1980-10-01 1980-10-01
naproxen_minf_to_m366 naproxen_m30_to_m1 naproxen_m365_to_m31
<dbl> <dbl> <dbl>
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
5 0 0 0
6 0 0 0
7 0 0 0
8 0 0 0
9 0 0 0
10 0 0 0
# ℹ more rows
All the add cohort intersect functions work in a similar way and they can be used to return the intersect date, days or count.
More information can be found at the website: https://darwin-eu-dev.github.io/PatientProfiles/
Thanks for listening!
Thanks everyone who contributed to this package!