Data Science with R and Python
This workshop was designed for second-year MPP students at UCR. It serves as an introduction to data science as its own discipline and is aimed at teaching the students the fundamentals of conducting a data analysis project with R and Python.
Workshop Learning Objectives
If you give an honest effort to solve each problem in this class, I promise that you will be able to do the follow at the end of the workshop:
-
Prepare a clean data for analysis
-
Identify patterns and visualize data
-
Use statistical methods to test hypotheses
-
Report your results in a reproducible manor
Modules in this tutorial
-
Intro to Data Science
What is Data Science? Data science is an emerging field that combines important concepts from statistics, computer science, and substantive areas of focus. Drew Conway’s Venn diagram of data science
-
Data Cleaning
We’ll be using R Studio for our workshop because it provides convenient support for both R and Python. It also provides an awesome visual markdown editor which allows you to see your markdown syntax compiled in real time.
-
Data Exploration and Hypothesis Generation
The exploratory data analysis (EDA) process is important because it helps with us learn more about what the data measures, what typical values are, and also identify errors. But EDA is also very helpful for identifying patterns in the data, relationships between variables, and ultimately generate testable hypotheses.
-
Data Analysis and Reporting Results
With a cleaned (tidy) dataset ready to go and with some detailed exploratory data analysis out of the way, we’re ready to test hypotheses. Rather than focus on the details of a specific programming language, this session will help you build your statistical modeling skills.