ENV 872 - GitHub Setup
Briefly, Git and GitHub are tools used in tandem to provide robust version control for data science and other coding projects. We’ll reveal more what exactly this means as the course progresses, but for now, you can think of GitHub as a kind of remote, cloud-based file system to which we back up and access our course workspace files. And Git is a locally installed application used to synchronize files between our local machine (or container) and GitHub. This can all be quite confusing at first, but you’ll understand all this much better as we guide you through using these valuable tools.
A guide to using Git & GitHub with R is provided at the website https://happygitwithr.com, and we recommend you at least skim this site to see what’s there; it’s a thorough and invaluable resource. Below, however, we’ve streamlined the process needed to get you up and running.
Step 1. Register a GitHub account
To use GitHub, you’ll first need to create a free GitHub account, if you don’t have one already.
-
Before creating an account, consider these tips provided on the Happy Git with R site on selecting a user name.
-
Navigate to https://github.com, click the
Sign Up
button, and follow the instructions.Pro tip:
Consider signing up for the Student Developer Pack (https://education.github.com/pack) for access to numerous additional perks!
Step 2: Fork the Class Repository
Your GitHub account allows you to create a number of different repositories, which are essentially project workspaces or “virtual folders” that hold all the files relevant to a particular coding project. You can certainly create your own new repositories from scratch, but here you’ll start off by “forking” an existing repository that we instructors have already created, a repository that already contains folders and files used in class exercises.
We call it “forking” instead of simply “copying”, because this forked copy retains a link to its origin such that if we make changes to the original repository (i.e. the one you are forking), you can propagate those changes to your forked repository. Thus, if we instructors add a file or make edits to an existing file, you can update your files with a simple set of Git commands.
For clarity, we’ll call the repository you’ll be forking the class repository, and the resulting forked repository your personal repository.
- Be sure you are logged in to your personal GitHub account.
- Navigate to the GitHub page where the class repository is located: https://github.com/ENV872/EDA_Spring2024
- Find the button labeled “
Fork
” and click it. - Accept all defaults on the page that comes up next, and click the
Create fork
button.
At this point you have your own copy of all the course materials to date, an exact replica of the course repository since you’ve made no changes. You are fine browsing these files in the GitHub’s web interface. It’s even possible to edit files here, However, WE STRONGLY ADVISE YOU AVOID DOING THIS. Why? Keeping files synchronized across Git/GitHub copies can get quite confusing, and eliminating one place where files get modified makes things just a bit simpler. (Just trust us for now; this will become more clear in a bit…) In fact, you should only view files using GitHub’s web interface, never modify or try to synchronize files here. Resist the temptation to do so!
Step 3: Create your personal access token
While anyone can view your repository, nobody can alter it without proper credentials. You established credentials when you logged into your GitHub account, and next we’ll have to authorize the Git software on your local machine (container or personal machine) so that it can upload new and modified files to your repository. Rather than giving the Git software our all-powerful Git password, however, we create a “personal access token” to which we can attach specific, scoped privileges and also set an expiration date - all to enhance security. (While our class files don’t really need to be too secure, other coding projects certainly do, so GitHub takes security quite seriously!)
Here are the steps to create a personal access token:
-
Be sure you are logged in to your personal GitHub account.
-
Navigate to this link: https://github.com/settings/tokens
Alternatively you can find the setting via the following sequence while logged into your GitHub account:
- Click the icon associated with your profile in the extreme upper right of the page to open the Profile menu.
- Select
Settings
from the dropdown, or just click on this link: https://github.com/settings/profile - The very last entry on the list on the left-hand side is
Develper settings
. Click that. - Then click on
Peronal access token
from the left-hand list on that page, and then selectTokens (classic)
-
From the
Generate new token
dropdown, selectGenerate new token (classic)
. You may be asked to authenticate again. -
In creating the token, use the following settings:
- Add a note that will help you identify what this token will be used for (in case you want to delete it). For example: “ENV 859 Class”.
- Set the expiration date to be sometime after the semester is over.
- For scopes: check the following boxes:
- repo
- workflow
- user
- Finally, click Generate token.
-
Copy the token somewhere you’ll be able to find again as you’ll need this string to link any new RStudio session to your GitHub account. (I find sending myself a Slack message with this token to be pretty good.)
Pro tip:
What do you do if you mess up in generating your token or forget what the token is? Just delete and create a new token. There’s no limit to the number of tokens you can create.
Your GitHub account and forked repository should now be good to go for the next stages of the course set up!