## DATA ANALYSIS

## R Resources

R is an open source statistical coding language that you can use to manipulate data and do statistics in an efficient and reproducible way.

Visit this page for some resources on how to download and install R and RStudio, and getting started using them.

Visit this page for some resources on how to download and install R and RStudio, and getting started using them.

__Useful R Packages__

The R user community is vibrant and constantly developing and improving open-source "packages" that help simplify your data analysis efforts! Below are a few resources and "cheat sheets" for some very useful R packages. Additional cheatsheets published by RStudio can be found on this website.

dplyr/tidyr: data wrangling and transformation

dtplyr: dplyr syntax wrappers for the data.table library (which provides R's fastest data processing tools; same cheat sheet as dplyr)

lubridate: date and time wrangling

ggplot: data visualization

Shiny: interactive visualization

viridis: expressive and colorblind-friendly palettes

foreach and doParallel: intuitive parallel processing

geoknife: processing of large, gridded datasets according to their overlap with landscape features (e.g. summarizing watershed data)

dataRetrieval: import USGS and EPA water data into R

dtplyr: dplyr syntax wrappers for the data.table library (which provides R's fastest data processing tools; same cheat sheet as dplyr)

lubridate: date and time wrangling

ggplot: data visualization

Shiny: interactive visualization

viridis: expressive and colorblind-friendly palettes

foreach and doParallel: intuitive parallel processing

geoknife: processing of large, gridded datasets according to their overlap with landscape features (e.g. summarizing watershed data)

dataRetrieval: import USGS and EPA water data into R

## Other Resources

__Useful Python Libraries__

numpy: basic numerical computing (vectorized, unlike base Python)

pandas: data frame operations

scipy: scientific computing

Scikit-Learn: machine learning

Matplotlib: publishable and highly customizable visualization

Seaborn: out-of-the-box plots for common plotting needs

Bokeh: interactive visualization

pandas: data frame operations

scipy: scientific computing

Scikit-Learn: machine learning

Matplotlib: publishable and highly customizable visualization

Seaborn: out-of-the-box plots for common plotting needs

Bokeh: interactive visualization

__Unix-like command line__

__Version control__

For more information on how to use R for more complex data visualization efforts, peruse