Summary and Schedule
Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. The lessons below were designed for those interested in working with social sciences data in R.
This is an introduction to R designed for participants with no programming experience. These lessons can be taught in a half-day, full-day, or over a two-day workshop (see Instructor Notes for suggested lesson plans). They start with some basic information about R syntax, the RStudio interface, and move through how to import CSV files, the structure of data frames, how to deal with factors, how to add/remove rows and columns, how to calculate summary statistics from a data frame, and a brief introduction to plotting.
Getting Started
Data Carpentry’s teaching is hands-on, so participants are encouraged to use their own computers to ensure the proper setup of tools for an efficient workflow.
These lessons assume no prior knowledge of the skills or tools.
To get started, follow the directions in the “Setup” tab to download data to your computer and follow any installation instructions.
Prerequisites
This lesson requires a working copy of R and
RStudio.
To most effectively use these materials,
please make sure to install everything before working through
this lesson.
About the example data
We will be working with a dataset of 200 Youtube posts. Here is some background to the dataset.
- An initial dataset of 314 posts were returned by the YouTube API in response to the following query: “Query: clicks south africa* hair (ad OR advertisement) -click” covering videos posted during the period 2020 - 2023
- The dataset was prepared using a spreadsheet and OpenRefine to identify missing values, change variable names to “snake case”, change tags and topics to lowercase, and to use semicolons to separate tags and topics.
- Owing to the ambiguity of the keyword “clicks”, many irrelevant results were returned. - The dataset was reviewed and any posts which were not related to the controversy, or which did not relate to issues about body politics and racism were excluded.
- This resulted in a dataset of 200 posts selected for computational analysis. In case you need a refresher, here are some details about how YouTube API derives the variables in this dataset.
If you are teaching this lesson in a workshop, please see the Instructor notes for helpful tips.
Setup Instructions | Download files required for the lesson | |
Duration: 00h 00m | 1. Before we Start |
How to find your way around RStudio? How to interact with R? How to manage your environment? How to install packages? |
Duration: 00h 40m | 2. Introduction to R |
What data types are available in R? What is an object? How can values be initially assigned to variables of different data types? What arithmetic and logical operators can be used? How can subsets be extracted from vectors? How does R treat missing values? How can we deal with missing values in R? |
Duration: 02h 00m | 3. Starting with Data |
What is a data.frame? How can I read a complete csv file into R? How can I get basic summary information about my dataset? How are dates represented in R and how can I change the format? |
Duration: 03h 20m | 4. Data Wrangling with dplyr |
How can I select specific rows and/or columns from a dataframe? How can I combine multiple commands into a single command? How can I create new columns or remove existing columns from a dataframe? |
Duration: 04h 00m | 5. Data Wrangling with tidyr | How can I reformat a data frame to meet my needs? |
Duration: 04h 40m | 6. Data Visualisation with ggplot2 |
What are the components of a ggplot? How do I create scatterplots, boxplots, and barplots? How can I change the aesthetics (ex. colour, transparency) of my plot? How can I create multiple plots at once? |
Duration: 06h 35m | 7. Getting started with R Markdown (Optional) |
What is R Markdown? How can I integrate my R code with text and plots? How can I convert .Rmd files to .html? |
Duration: 07h 20m | 8. Processing JSON data (Optional) |
What is JSON format? How can I convert JSON to an R dataframe? How can I convert an array of JSON record into a table? |
Duration: 08h 05m | 9. Text as data (Optional) |
What are the most important concepts for automating text
analysis? How can I prepare text data for analysis? How can I visualise frequency, collocation and concordance for a corpus of textual data? |
Duration: 08h 50m | Finish |
The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.
FIXME: Setup instructions live in this document. Please specify the tools and the data sets the Learner needs to have installed.
Data Sets
Download the data zip file and unzip it to your Desktop
Software Setup
Details
Setup for different systems can be presented in dropdown menus via a
solution
tag. They will join to this discussion block, so
you can give a general overview of the software used in this lesson here
and fill out the individual operating systems (and potentially add more,
e.g. online setup) in the solutions blocks.
Use PuTTY
Use Terminal.app
Use Terminal