7  Github

7.1 Git, Github and Github desktop

Git is a powerful tool that helps developers manage and track changes in their projects. It allows multiple people to work on the same project simultaneously without causing conflicts or losing any work. Git keeps a record of every change made, which makes it easy to go back to previous versions if needed. This makes collaboration smoother and ensures that the project evolves in an organized and reliable way.

GitHub is a web platform built on top of Git where developers can host their projects and collaborate more effectively. It provides a space where people can share their code, discuss issues, and plan their work.

Github desktop is a user-friendly way to use git and github interface from an app. You can download it here: https://github.com/apps/desktop. We usually use github desktop to perform commits, push, pull and perform the different github operations. If you are very skilled you can also use git from the terminal.

If you are interested to know more you can read it here, you can follow this tutorial.

7.2 Use

We will use github for three different purposes:

  1. Develop study code in a collaborative way. Multiple users will contribute to the same repo. This repositories will be usually kept as private so we will have to give them access to the individuals that can edit it.

  2. Share study code for a network study.

  3. Develop an R package in a collaborative way.

If you don’t have a github account you can create one at https://github.com

7.3 The basis

7.3.1 What is a Repository?

A repository, often shortened to “repo” is a storage space where your project lives. It can contain files, folders, images, videos, spreadsheets, data sets, and anything else your project needs. A GitHub repository also includes a history of all changes made to the files within it, making it easy to track and manage your project’s development.

7.3.2 How to Clone a Repository

Cloning a repository means creating a local copy of a project that is stored on GitHub. This allows you to work on the project on your own machine. To clone a repository, you can use the following steps (Github desktop):

  1. Top left: Add -> Clone Repository…

  2. provide user or organization name and the name of the repo that you want to clone separated by “/”. Example: ‘oxford-pharmacoepi/OxinferOnboarding’.

  3. Clone

If your repo contains an R project you can open it from RStidio and edit there what you want.

7.3.3 What is a Branch?

A branch in GitHub is a parallel version of a repository. It diverges from the main working project to work on something without affecting the main branch. Branches are used to develop features, fix bugs, or experiment safely. For example it is useful if you are working collaboratively to make some changes to a repository without affecting the others and submitting your changes so someone can review them.

In general if the repo is for your use only you will always work on the “main” branch. You can switch or create a new branch using the branch tab in Github desktop.

7.3.4 What is a Commit?

A commit is a snapshot of your repository at a specific point in time. When you make changes to files in your repository, you can “commit” those changes, which records them in the repository’s history. Each commit has a unique identifier (a hash), a message describing the changes, and metadata like the author’s name and the timestamp.

To create a commit you just have to tick the files that you want to commit, add a message and click commit to create the commit.

Initially this commit will be only local, you could send it online using push. Then all your changes have been uploaded to the github platform.

7.3.5 What is to Push and to Pull?

  • Push: When you push changes, you send your committed changes from your local repository to a remote repository on GitHub. This makes your changes available to others.

  • Pull: Pulling is the process of fetching updates from a remote repository and merging them into your local repository. This ensures your local repository is up to date with the latest changes made by others.

7.3.6 What is a Pull Request?

A pull request (PR) is a way to propose changes to a repository. After you’ve made changes in a branch (not main), you can open a pull request to merge those changes into another branch, typically the main branch. Other team members can review, comment, and approve the changes before they are merged.

By understanding these fundamental concepts, you can effectively use GitHub to manage your projects and collaborate with others.

7.3.7 What is an Issue?

Issues on GitHub are used to track tasks, enhancements, and bugs for your projects. They are a great way to organize and prioritize your work. Each issue can include a title, a detailed description, labels, and comments from collaborators.

To create an issue:

  1. Go to the repository on GitHub.

  2. Click the “Issues” tab.

  3. Click the “New issue” button.

  4. Fill in the title and description, and submit the issue.

Issues can be assigned to team members, labeled with tags, and linked to pull requests, making them a powerful tool for project management.

7.3.8 How to Use the .gitignore File

The .gitignore file is used to specify files and directories that Git should ignore. This is useful for excluding files that are not meant to be tracked, such as temporary files, build outputs, and sensitive information.

To create and use a .gitignore file:

  1. Create a file named .gitignore in the root of your repository.

  2. Add the files and directories you want to ignore. For example:

# Ignore all .csv files
*.csv
   
# Ignore the Results directory
Results/
   
# Ignore sensitive files (in this case)
connectToDatabase.R

By using a .gitignore file, you can keep your repository clean and free of unnecessary files, ensuring that only relevant files are tracked and shared.

7.4 Github in Rstudio

You can configure your github account in your RStudio session so it is easier to use it. This is mainly recommended when you can not use github desktop (e.g. on the server).

You can learn how to configure git and github in Rstudio following this tutorial.

7.4.1 Set up your GITHUB_PAT

In general you can install an R package from github using devtools or remotes package:

remotes::install_github("darwin-eu/CDMConnector")

Sometimes we keep the development of some packages private, so we can include mentions of specific studies or keep the issues private. For example there exist a private version of CDMConnector in darwin-eu-dev, if you try to install it you would get the following error:

remotes::install_github("darwin-eu-dev/CDMConnector")
Using bundled GitHub PAT. Please add your own PAT using `gitcreds::gitcreds_set()`
Error: Failed to install 'CDMConnector' from GitHub:
  HTTP error 401.
  Bad credentials

  Rate limit remaining: 59/60
  Rate limit reset at: 2024-08-09 14:45:00 UTC

  

So you can install private packages (that you have access to) you need to set up the GITHUB_PAT. You can set it up with the following steps:

  1. Log in to github and go to the tokens section: https://github.com/settings/tokens.

  2. Select the option: Generate new token (classic) (it is in the top-right dropdown menu Generate new token -> Generate new token (classic)).

  3. Add a note so you know what is this token for and tick at least the following box: repo; and click on Generate token.

  4. Copy the token (it should be a combination of letters and numbers that starts with ghp_...).

  5. Set a new .Renviron variable called GITHUB_PAT.

usethis::edit_r_environ()
# write there
GITHUB_PAT = "ghp_..."
  1. Restart your R session.

Now you should be able to install the private repo (if you have access to it):

remotes::install_github("darwin-eu-dev/CDMConnector")

7.5 Final remarks

Important

It is very important that we do not commit to github: - personal information - study results - database credentials

Keep in mind that after you push some changes in a github repo, even if you delete them later those changes will remain in the commit history.