install.packages(c("DBI", "RPostgres", "dplyr", "dbplyr", "usethis", "here"))
6 Connect to the database
6.1 Getting started
To connect to databases we will use DBI package and CDMConnector, you can find more information about both packages in their websites:
DBI package website: https://dbi.r-dbi.org/
CDMConnector package website: https://darwin-eu.github.io/CDMConnector/
Connect to database (standard way) and set up the environment
6.2 Load libraries
The following libraries will be used in this chapter: DBI, RPostgres, dplyr, dbplyr, usethis and here. If you do not have them installed you can install them with the following command:
library(DBI)
library(RPostgres)
library(dplyr)
library(dbplyr)
library(usethis)
library(here)
6.3 Credentials and database details
To create a connection to a database we need some parameters:
host
: it is the IP of the computer that contains the database in our case it will be always the same and you can check it hereport
: port to connect, in our case it will be always the same and you can check it heredbname
: name of the database we want to connect, each database has a different name, link to the names of the databases hosted by our server: linkuser
: each individual has a user to connect to the database this is unique and nontransferable.password
: associated to each user.
To get a user and password or if you are not sure of what are the parameters of a certain database you can ask Hez.
6.4 Create a connection
There are several ways to create a connection as seen here and this depends on the Database Management System (DBMS) of your back-end.
In our case for the moment all our databases are in PostgreSQL (also refereed as Postgres) one of the most popular free dbms that exist. To connect to a Postgres we have to populate with the following information the connection details:
<- dbConnect(drv = Postgres(),
con dbname = "...",
host = "...",
port = "...",
user = "...",
password = "...")
To test that the connection that you created works you can test the following command:
tbl(con, sql("SELECT * FROM public.person limit 1"))
Creating CDM database
/home/runner/work/OxinferOnboarding/OxinferOnboarding/GiBleed_5.3.zip
# Source: SQL [1 x 18]
# Database: DuckDB v1.0.0 [unknown@Linux 6.5.0-1025-azure:R 4.4.1//tmp/Rtmp6cVQkN/file138a14ee0db.duckdb]
person_id gender_concept_id year_of_birth month_of_birth day_of_birth
<int> <int> <int> <int> <int>
1 6 8532 1963 12 31
# ℹ 13 more variables: birth_datetime <dttm>, race_concept_id <int>,
# ethnicity_concept_id <int>, location_id <int>, provider_id <int>,
# care_site_id <int>, person_source_value <chr>, gender_source_value <chr>,
# gender_source_concept_id <int>, race_source_value <chr>,
# race_source_concept_id <int>, ethnicity_source_value <chr>,
# ethnicity_source_concept_id <int>
Note that the result will be different as this one is from a synthetic database.
We can have multiple connections open at the same time, but it is recommended to not open more than one connection simultaneously, and close connections when we finish using them.
Once checked that we are able to connect to the database we can disconnect:
dbDisconnect(conn = con)
6.5 Database schemas
Databases are organised in schemas. Schemas are the internal structure of a database, they are like “folders”. In general when we work with an OMOP database we will have to schemas:
cdmSchema
this schema contains all the OMOP Standard tables. Typically we would only have reading permissions to this schema, because these tables are not meant to be edited.writeSchema
this schema is usually empty and is for the user to save their tables (usually cohorts) of intermediate results that you want to keep or reuse later. Typically you would have writing and reading permissions for this schema. In our database we can only read and edit the tables that we created, so we can not edit or read other people tables. It is a good practice to write your own tables with a starting prefix so it is easier to avoid name conflicts.
For our databases the cdmSchema =
public and the writeSchema =
results.
6.6 Create the cdm object
The cdm object is a structure on top of our database connection to access all the tables in a user-friendly way, see more information in the CDMConnector website
To create our first cdm object we need first a connection to our back-end:
<- dbConnect(drv = Postgres(),
con dbname = "...",
host = "...",
port = "...",
user = "...",
password = "...")
And then we can create the cdm object:
library(CDMConnector)
<- cdmFromCon(con = con, cdmSchema = "public", writeSchema = "results") cdm
The cdm object has a print that shows all the tables that you are connected to:
cdm
── # OMOP CDM reference (duckdb) of Synthea synthetic health database ──────────
• omop tables: person, observation_period, visit_occurrence, visit_detail,
condition_occurrence, drug_exposure, procedure_occurrence, device_exposure,
measurement, observation, death, note, note_nlp, specimen, fact_relationship,
location, care_site, provider, payer_plan_period, cost, drug_era, dose_era,
condition_era, metadata, cdm_source, concept, vocabulary, domain,
concept_class, concept_relationship, relationship, concept_synonym,
concept_ancestor, source_to_concept_map, drug_strength
• cohort tables: -
• achilles tables: -
• other tables: -
And you can easily access to one of this tables with:
$person cdm
# Source: table<main.person> [?? x 18]
# Database: DuckDB v1.0.0 [unknown@Linux 6.5.0-1025-azure:R 4.4.1//tmp/Rtmp6cVQkN/file138a5dde81b5.duckdb]
person_id gender_concept_id year_of_birth month_of_birth day_of_birth
<int> <int> <int> <int> <int>
1 6 8532 1963 12 31
2 123 8507 1950 4 12
3 129 8507 1974 10 7
4 16 8532 1971 10 13
5 65 8532 1967 3 31
6 74 8532 1972 1 5
7 42 8532 1909 11 2
8 187 8507 1945 7 23
9 18 8532 1965 11 17
10 111 8532 1975 5 2
# ℹ more rows
# ℹ 13 more variables: birth_datetime <dttm>, race_concept_id <int>,
# ethnicity_concept_id <int>, location_id <int>, provider_id <int>,
# care_site_id <int>, person_source_value <chr>, gender_source_value <chr>,
# gender_source_concept_id <int>, race_source_value <chr>,
# race_source_concept_id <int>, ethnicity_source_value <chr>,
# ethnicity_source_concept_id <int>
In the other tutorials you will learn more how to program and use the cdm object.
Once you finished using it you can close the connection of the cdm object with:
cdmDisconnect(cdm)
This is equivalent to do:
dbDisconnect(con)
6.7 Change password
The database administrator will give us a password it is extremely important that you change it after you connect to the database for first time. To change your password you must:
- open a connection to any of our databases:
<- dbConnect(drv = Postgres(),
con dbname = "...", # here you can connect to any of our databases
host = "...",
port = "...",
user = "...",
password = "...")
- Change the password, to do so we have to run the following command:
dbGetQuery(con, "ALTER USER xxxxx WITH PASSWORD 'xxxxxxxxxxx'")
Example:
dbGetQuery(con, "ALTER USER martics WITH PASSWORD '12345678'")
Make sure that you chose an strong password, see the University guide for more information about strong and safe passwords.
- Disconnect from the database:
dbDisconnect(con)
- Connect again with the new password to check that the change was done correctly:
<- dbConnect(drv = Postgres(),
con dbname = "...",
host = "...",
port = "...",
user = "...",
password = "new_password")
- Disconnect from the database again:
dbDisconnect(con)
*NOTE: even you connect to a particular database to change the password, username and password are unique for all databases, so changing it only once changes for any dbname
in our evironment.
6.8 Set your .Renviron
6.8.1 What are secrets?
Connection to the database it is a repetitive process that you will repeat every day, but at the same time you need to keep all your database credentials safely guarded. The R environ file will help use to do so.
.Renviron is a file linked to your R session to safely save secrets. It is loaded at the beginning of each session and secrets are kept till the session is terminated or restarted.
To access to a secret you can type:
Sys.getenv("MY_SECRET")
[1] ""
As we have just seen if a secret does not exist the output will be an empty string: ""
.
You can set a temporary secret using the following command:
Sys.setenv(MY_SECRET = "123456789")
Then now if you run the same command than before you will get the value that we have just set back:
Sys.getenv("MY_SECRET")
[1] "123456789"
This value can be assigned to a variable as we would do as usual:
<- "123456789"
x print(x)
[1] "123456789"
<- Sys.getenv("MY_SECRET")
y print(y)
[1] "123456789"
As you can see x
and y
have the same value but if you share the code in the second case the code would only work if you have the same secret stored in your environment. This is very useful to store the connection details so even if you share your code you would not be share sensitive information.
There are two type of secrets:
temporal (as we have just seen), these secrets only last for while the session is ongoing the moment that you restart or terminate it all the secrets will be gone, so it is not recommended to use temporal secrets.
permanent (we will see in the next step), these secrets are kept across different sessions and will always be in your environment unless you explicitly delete them. These secrets are stored in the .Renviron file. We will use permanent secrets to store our database credentials.
You can read more about secrets here.
*NOTE: you can see all the secrets of your R session running the following command:
Sys.getenv()
6.8.2 Set up your own .Renviron file
To open your .Renviron file you can run the following command:
edit_r_environ()
There you can write secrets that you want to store
= "123456789" MY_SECRET
It is like writing in the Sys.setenv()
function, but these secrets will be permanent and always be loaded in your session.
Secrets are loaded every time our session starts, so if we modify the .Renviron file and we want the new secrets to be loaded we should restart the R session (Control+Shift+F10).
Try writing your username, password, port and host there and create a connection with the following code:
<- dbConnect(drv = Postgres(),
con dbname = "...",
host = Sys.getenv("HOST"),
port = Sys.getenv("PORT"),
user = Sys.getenv("USER"),
password = Sys.getenv("PASSWORD"))
6.9 Final remark
YOU MUST NEVER INCLUDE CONNECTION DETAILS IN A SCRIPT THAT IS PUBLIC OR YOU SHARE WITH SOMEONE, EVEN IF YOU TRUST THAT PERSON OR IS IN THE SAME ORGANIZATION.