ENV 872 - Creating your RStudio project

ENV 872 - EDA   |   Spring 2024   |   Instructors: Luana LimaJohn Fay  |  

The last step in getting set up for class exercises and assignments is to create a version-controlled RStudio project that is linked to your personal GitHub repository. As a reminder, for this to work, you’ll need: (1) a machine with R, RStudio, and Git installed; (2) a forked copy of the class GitHub repository; and (3) RStudio configured with your user name, user email, and personal access token. These steps are all described in previous documents.

Recorded instructions:

“Clone” your forked GitHub repository as an RStudio Project

An RStudio project is essentially a folder that contains all the files - code, data (raw and processed), metadata, and any other files - related to your analysis. It will likely also include some RStudio files (e.g. an “.RProj” file and an “.Rhistory”) that stores settings specific to your session.

Your personal class repository, i.e., the one you forked from the overall class repository, already contains several of these files that we’ll use throughout this course. It will be the foundation of your RStudio project, and you will make a local copy of this remote repository through a process called cloning. This is done fairly seamlessly within the RStudio act of setting up a new Project via the steps below.

  • In RStudio, create a new project - either using File>New Project or in the dropdown in the upper right.

  • In the “New Project Wizard” that appears, select Version Control, then Git.

  • For the repository URL, copy and paste the URL of your forked repository (:rotating_light:NOT the class repository).

  • Accept the default project directory name, but specify (or at least acknowledge) the location where the project folder will live on the local machine.

  • Click Create Project

    :warning:Avoid putting your project in a web-synchronized folder, e.g., a One-Drive or Box folder, as these spaces often have “tracer” files of their own that can mess up Git’s synchronization workflow and cause odd errors. Also avoid paths and project names that have odd characters in them; code can be quite sensitive to these things.

You may have guessed that by selecting “Version Control/Git” and then specifying the URL of the GitHub repository, you instructed RStudio to clone your remote repository to your local machine. This more than just downloads the files; it establishes a link between our local RStudio project and our remote GitHub repository such that we can synchronize the two. This is why we give it the special name of “cloning” vs just copying: the two are identical versions (for now) of the same material.

Introduction to Version Control

This local, cloned repository is where we will do all of our analysis. We can create, add, edit and delete files in this local workspace, and then periodically synchronize these changes up to the workspace’s remote counterpart located on GitHub.com. Doing so has the obvious advantage of creating a remote backup of our work, in case our local machine dies for whatever reason. It also has the benefit of allowing us to work from several machines if we want to: for example, if you have a work machine and a home machine, you can synchronize all your changes up to GitHub at the end of a session at the office, and then go home and pull all the updates to your home machine and continue working. Not great for the work-life balance, but nevertheless…

Yes, you could do this with a service like Box, GoogleDrive, OneDrive, etc., but Git and other version control set-ups go one step further: you can tag incremental changes or “versions” of your files with messages so that you can undo those particular changes. So rather than the typical “MyReport.docx”, “MyReport_Copy1.docx”, “MyReport_Copy1_save2.docx” system that you may be familiar with, you just have “MyReport.docx” with a ledger accessibly by Git of all the changes you’ve made, allowing you to go back in any point of time and resurrect changes. This can be quite useful as you learn to code!

The version control workflow does take a bit of getting used to, as there are more steps than just saving a file, but we have time to get familiar with that workflow. RStudio also has many of the key Git commands to perform version control embedded within the application, making the process a bit easier to learn. In fact, you should notice that in your Environment Pane, a new tab is available called Git. This is the control center for Git operations, though Git command can also be issued at the terminal prompt.

Assignment 1 gives you your first taste of both using RStudio and the Git/GitHub version control workflow…