Getting Started with Python¶

Name: Your_name
Class: ENV 872 - Environmental Data Analytics
Date: Spring 2024

Instructions¶

  • Edit the markdown cell above to change Your_name to your name
  • Complete the code cells below as instructed
  • When complete, select "Save And Export Notebook As.." from the "File" menu and the save as an HTML file.
  • Submit the HTML file to Sakai.

1. Working with variables and values¶

  1. In the code cell below, create and assign values to the following 4 variables:
Variable Value
TreeID 101
Species Oak
Height 15.5
Planted False
In [1]:
#Create the four variables
  1. In the 4 code cells below, display the type of each variable (one per code cell)
In [2]:
#Show the type of the TreeID variable
Out[2]:
int
In [3]:
#Show the type of the Species variable
Out[3]:
str
In [4]:
#Show the type of the Height variable
Out[4]:
float
In [5]:
#Show the type of the Planted variable
Out[5]:
bool
  1. Below are two code cells. The first [provided] creates a list of metrics for "Tree A". In the second code cell, combine the four variables created in Step 2 into a single list object named tree_B
In [6]:
#Create a list describing Tree "A"
tree_A = [104,"Elm",12.1,True]
In [7]:
#Create a list describing Tree "B" -- using the variables created above
  1. Extract the height of tree B from tree A by first extracting the 3rd item from each list into their own variables and then using a comparative operator to evaluate whether Tree "A" is taller than Tree "B".
In [8]:
#Extract the height from the list for Tree "A"
In [9]:
#Extract the height from the list for Tree "A"
In [10]:
#Evaluate whether Tree "A" is taller than Tree "B"
Out[10]:
False

2. Working with data using NumPy and Pandas¶

2a. Importing the packages¶

In [11]:
#Import the numpy package, calling it "np"
import numpy as np
In [12]:
#Import the pandas package, calling it "pd"
/opt/conda/lib/python3.9/site-packages/pandas/core/computation/expressions.py:21: UserWarning: Pandas requires version '2.8.4' or newer of 'numexpr' (version '2.8.3' currently installed).
  from pandas.core.computation.check import NUMEXPR_INSTALLED

2b. Creating and using Numpy arrays¶

In [13]:
#Create a vector of tree height values, in meters
height_meters = np.array([2.1, 3.2, 5.6, 2.2, 3.1])
In [14]:
#Compute the mean height in meters
np.mean(height_meters)
Out[14]:
3.2400000000000007
In [15]:
#Convert the values to cm
In [16]:
#Compute the median height in cm
Out[16]:
310.0

► There are 39.37 inches in a meter. What is the median height of trees, in inches?

In [17]:
#Convert matrix values from cm to inches
In [18]:
#Compute the median of the heights in inches
Out[18]:
122.047

2c. Working with dataframes in Pandas¶

In [19]:
#Read the USGS dataset into a dataframe object
NTL_LTER = pd.read_csv("./data/Processed_KEY/NTL-LTER_Lake_ChemistryPhysics_PeterPaul_Processed.csv")
In [20]:
#View the first 5 records
Out[20]:
lakeid lakename year4 daynum month sampledate depth temperature_C dissolvedOxygen irradianceWater irradianceDeck comments
0 L Paul Lake 1984 148 5 1984-05-27 0.00 14.5 9.5 1750.0 1620.0 NaN
1 L Paul Lake 1984 148 5 1984-05-27 0.25 NaN NaN 1550.0 1620.0 NaN
2 L Paul Lake 1984 148 5 1984-05-27 0.50 NaN NaN 1150.0 1620.0 NaN
3 L Paul Lake 1984 148 5 1984-05-27 0.75 NaN NaN 975.0 1620.0 NaN
4 L Paul Lake 1984 148 5 1984-05-27 1.00 14.5 8.8 870.0 1620.0 NaN

Examine characteristics of our dataframe...

In [21]:
#Display the data type of our NLT_LTER object
Out[21]:
pandas.core.frame.DataFrame
In [22]:
#Reveal the column names 
Out[22]:
Index(['lakeid', 'lakename', 'year4', 'daynum', 'month', 'sampledate', 'depth',
       'temperature_C', 'dissolvedOxygen', 'irradianceWater', 'irradianceDeck',
       'comments'],
      dtype='object')
In [23]:
#Reveal the structure of our dataframe
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21613 entries, 0 to 21612
Data columns (total 12 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   lakeid           21613 non-null  object 
 1   lakename         21613 non-null  object 
 2   year4            21613 non-null  int64  
 3   daynum           21613 non-null  int64  
 4   month            21613 non-null  int64  
 5   sampledate       21613 non-null  object 
 6   depth            21613 non-null  float64
 7   temperature_C    19442 non-null  float64
 8   dissolvedOxygen  19342 non-null  float64
 9   irradianceWater  15451 non-null  float64
 10  irradianceDeck   14626 non-null  float64
 11  comments         244 non-null    object 
dtypes: float64(5), int64(3), object(4)
memory usage: 2.0+ MB
In [24]:
#Reveal the dimensions of our dataframe
Out[24]:
(21613, 12)
In [25]:
#Use the `value_counts()` function to reveal how many records correspond to unique values in the `lakename` column.
Out[25]:
lakename
Peter Lake    11288
Paul Lake     10325
Name: count, dtype: int64

Check our date column and set it to a datetime object

In [26]:
#Reveal datatype of the sampledate column
Out[26]:
dtype('O')
In [27]:
#Change it to a proper datetime object
In [28]:
#Reveal datatype of the datetime column
Out[28]:
dtype('<M8[ns]')

3. Plotting with Plotnine/ggplot¶

In [29]:
#Install plotnine (install if needed)
try: 
    from plotnine import *
except:
    !pip install plotnine
    from plotnine import *
In [30]:
#Create a bar plot of temperature by lakename
In [31]:
#Create a histogram of temperature values using 15 bins
/opt/conda/lib/python3.9/site-packages/plotnine/layer.py:284: PlotnineWarning: stat_bin : Removed 2171 rows containing non-finite values.
  • Create a plot of irradiance of water vs depth, adding a regression line (geom_smooth) and scaling the y values from 0 to 2500.
In [32]:
#Create one more plot of your choosing
/opt/conda/lib/python3.9/site-packages/plotnine/layer.py:364: PlotnineWarning: geom_point : Removed 6163 rows containing missing values.
/opt/conda/lib/python3.9/site-packages/plotnine/layer.py:364: PlotnineWarning: geom_smooth : Removed 46 rows containing missing values.