Course Setup

ENV 872 - EDA   |   Spring 2024   |   Instructors: Luana LimaJohn Fay  |  

Selecting your computing environment

The first step in doing any data analysis is setting up a computing environment. In our case, this means gaining access to a machine that has R, RStudio, and Git installed. (We’ll explain what each of these does in due time…), and you have a few options on how to achieve this:

  • Option 1:
    Simply log on to any Nicholas School machine. These machines have all the required software installed on them, and because NSOE oversees these machines, we can usually debug any issue you have. However, you need to be on campus at an available machine to use them, of course. And you’ll need to ensure your workspace is synchronized at the start and end of every session. You will also have to authenticate each new machine you work on: not difficult, but a bit of a nuisance.

  • Option 2:
    Create and use a Duke virtual machine. These require a bit of set up: you’ll need to create the virtual machine and then install the software yourself. However, once you do, the machine is fully under your control, and you can access it via remote desktop software from any other machine on campus or off. Because these start as “clean machines”, they are less likely to encounter odd software issues/conflicts and can be easier to debug if they occur. These machines are set up by default to shut-down each night, so you’ll have to be sure to save your work - or you can select the option to keep them running 24/7, in which case you can just pick up where you left off.

  • Option 3:
    Create and use a Duke RStudio container. A “container” is a web-based resource pre-configured to run all the software you need, but only that software. RStudio and Git will work and appear just as they would on a typical desktop machine, but there’s no real operating system outside those applications to interact with. The advantage to these containers, however, is that you can access them via a web browser from anywhere, and when you do access it, your session picks up right where you left off.

  • Option 4:
    And finally, you can use your own personal machine. Many students opt for this as it is overall the most convenient – if things go smoothly. You will have to ensure that the required software is installed (or updated, if already installed), but we give instructions for that. The one major drawback is that if you run into configuration issues or odd software errors, we may not be able to help since it's possible the error is related to some unique configuration on your machine.

So, which option should you choose? If previous classes are an indication, most students chose to work on a personal machine or virtual container. We had a few issues, but the best news is that it’s not all that hard to switch your choice if some problem arises.

What’s next?

Regardless of what option you decide on, you’ll still need to to some level of configuration. It’s here in this section that we provide instructions – written and recorded – on how to install and configure everything needed for this course: