CDMConnector

Creating a reference to a database using the OMOP common data model

Installation

For a free browser-based RStudio instance go to https://posit.cloud

# Initial setup
install.packages("CDMConnector")
install.packages("duckdb")
install.packages("IncidencePrevalence")

Sys.setenv(EUNOMIA_DATA_FOLDER = "~/eunomia_data")

library(CDMConnector)
downloadEunomiaData()
con <- DBI::dbConnect(duckdb::duckdb(), eunomia_dir())

cdm <- cdm_from_con(con)

cdm$person  

DBI::dbDisconnect(con, shutdown = T)

Interacting with an OMOP common data model from R

We already know what the structure of the OMOP CDM looks like. The CDMConnector package was made to help you to quickly create a reference to the OMOP CDM data as a whole.

Interacting with an OMOP common data model from R

First install CDMConnector from CRAN.

install.packages("CDMConnector")

Once we have it installed, we can then load it along with other R packages.

library(CDMConnector)
library(dplyr)

Creating a reference to the OMOP common data model

For this example, we’ll use the Eunomia example data contained in a file based database called duckdb. First we need to set an environment variable to tell R where to put the downloaded data.

usethis::edit_r_environ() # Open your .Renviron file

Add the following line to the file. Close it and restart R.

EUNOMIA_DATA_FOLDER='/Users/adamblack/eunomia_data'

Then download the example dataset.

CDMConnector::downloadEunomiaData()

Creating a reference to the OMOP common data model

1 Load necessary libraries

library(CDMConnector)

2 Connect to the database

con <- DBI::dbConnect(duckdb::duckdb(), dbdir = eunomia_dir())

3 Create the cdm reference

cdm <- cdm_from_con(con, cdm_schema = "main")
cdm

# OMOP CDM reference (tbl_duckdb_connection)

Tables: person, observation_period, visit_occurrence, visit_detail, condition_occurrence, drug_exposure, procedure_occurrence, device_exposure, measurement, observation, death, note, note_nlp, specimen, fact_relationship, location, care_site, provider, payer_plan_period, cost, drug_era, dose_era, condition_era, metadata, cdm_source, concept, vocabulary, domain, concept_class, concept_relationship, relationship, concept_synonym, concept_ancestor, source_to_concept_map, drug_strength

Creating a reference to the OMOP common data model

Once we have created the our reference to the overall OMOP CDM, we can reference specific tables using the “$” operator or [[““]].

cdm$observation_period %>% head(2)

# Source:   SQL [2 x 5]
# Database: DuckDB 0.8.0 [root@Darwin 21.6.0:R 4.2.2//var/folders/xx/01v98b6546ldnm1rg1_bvk000000gn/T//Rtmp84r9SJ/wxocjuxg]
  observation_period_id person_id observation_period_st…¹ observation_period_e…²
                  <int>     <int> <date>                  <date>                
1                     6         6 1963-12-31              2007-02-06            
2                    13        13 2009-04-26              2019-04-14            
# ℹ abbreviated names: ¹observation_period_start_date,
#   ²observation_period_end_date
# ℹ 1 more variable: period_type_concept_id <int>

cdm[["observation_period"]] %>% head(2)

# Source:   SQL [2 x 5]
# Database: DuckDB 0.8.0 [root@Darwin 21.6.0:R 4.2.2//var/folders/xx/01v98b6546ldnm1rg1_bvk000000gn/T//Rtmp84r9SJ/wxocjuxg]
  observation_period_id person_id observation_period_st…¹ observation_period_e…²
                  <int>     <int> <date>                  <date>                
1                     6         6 1963-12-31              2007-02-06            
2                    13        13 2009-04-26              2019-04-14            
# ℹ abbreviated names: ¹observation_period_start_date,
#   ²observation_period_end_date
# ℹ 1 more variable: period_type_concept_id <int>

View the SQL queries

cdm$observation_period %>% 
  head(2) %>% 
  show_query()

<SQL>
SELECT q01.*
FROM main.observation_period AS q01
LIMIT 2

%>% head(2) was used to only show the first 2 lines.

Creating a reference to the OMOP common data model

By default the cdm reference is created with all the tables from the cdm. But When we created our reference we could have also specified a subset of cdm tables that we want to read:

cdm <- cdm_from_con(con) %>% 
  cdm_select_tbl("person","observation_period")
cdm

# OMOP CDM reference (tbl_duckdb_connection)

Tables: person, observation_period

cdm$person %>% count()

# Source:   SQL [1 x 1]
# Database: DuckDB 0.8.0 [root@Darwin 21.6.0:R 4.2.2//var/folders/xx/01v98b6546ldnm1rg1_bvk000000gn/T//Rtmp84r9SJ/wxocjuxg]
      n
  <dbl>
1  2694

Creating a reference to the OMOP common data model

Moreover, we can also specify a write schema and the tables that we are interested in it when creating our reference. For example, if we wanted to create a reference to the person and observation period tables in the common data model along with cohort tables in a schema we have write access to, we could do this like so:

DBI::dbExecute(con, "create schema results")

[1] 0

cdm <- cdm_from_con(con, 
                    cdm_schema = "main",
                    write_schema = "results")

Database name

When we have a cdm object we can check which is the name of that database using:

cdmName(cdm)

[1] "Synthea synthetic health database"

In some cases we want to give a database a name that we want, this can be done at the connection stage:

cdm <- cdm_from_con(   
  con = con,    
  cdm_schema = "main",    
  write_schema = "main",
  cdm_name = "my_database" 
)

cdmName(cdm)

[1] "my_database"

Database snapshot

The database snapshot is a useful tool to get information on the characteristics of your database:

snapshot(cdm)

# A tibble: 10 × 2
   attribute              value                            
   <chr>                  <chr>                            
 1 cdm_source_name        Synthea synthetic health database
 2 cdm_version            v5.3.1                           
 3 cdm_holder             OHDSI Community                  
 4 cdm_release_date       2019-05-25                       
 5 vocabulary_version     v5.0 18-JAN-19                   
 6 person_cnt             2694                             
 7 observation_period_cnt 5343                             
 8 cdm_schema             main                             
 9 write_schema           main                             
10 cdm_name               my_database

In network studies this can be very useful to export the characteristics of each one of the databases:

readr::write_csv(snapshot(cdm), here::here(paste0("snapshot_", cdmName(cdm), ".csv")))

Install remaining packages

install.packages("rJava") # Java JDK needs to be installed.
remotes::install_github("OHDSI/CirceR")
remotes::install_github("OHDSI/SqlRender")
install.packages("CodelistGenerator")
install.packages("DrugExposureDiagnostics")

Test that Java works from R

# In some cases you have to manually load the java virtual machine
dyn.load("/Library/Java/JavaVirtualMachines/jdk1.8.0_333.jdk/Contents/Home/jre/lib/server/libjvm.dylib")
SqlRender::translate("DATEADD(day, 1, column_name)", "sqlite")

[1] "CAST(STRFTIME('%s', DATETIME(column_name, 'unixepoch', (1)||' days')) AS REAL)"
attr(,"sqlDialect")
[1] "sqlite"