Before we Start


  • Use RStudio to write and run R programs.
  • Use install.packages() to install packages (libraries).

Introduction to R


  • Access individual values by location using [].
  • Access arbitrary sets of data using [c(...)].
  • Use logical operations and logical vectors to access subsets of data.

Starting with Data


  • Use read_csv to read tabular data in R.

Data Wrangling with dplyr


  • Use the dplyr package to manipulate dataframes.
  • Use select() to choose variables from a dataframe.
  • Use filter() to choose data based on values.
  • Use group_by() and summarize() to work with subsets of data.
  • Use mutate() to create new variables.

Data Wrangling with tidyr


  • Use the tidyr package to change the layout of data frames.
  • Use pivot_wider() to go from long to wide format.
  • Use pivot_longer() to go from wide to long format.

Data Visualisation with ggplot2


  • ggplot2 is a flexible and useful tool for creating plots in R.
  • The data set and coordinate system can be defined using the ggplot function.
  • Additional layers, including geoms, are added using the + operator.
  • Boxplots are useful for visualizing the distribution of a continuous variable.
  • Barplots are useful for visualizing categorical data.
  • Faceting allows you to generate multiple plots based on a categorical variable.

Getting started with R Markdown (Optional)


  • R Markdown is a useful language for creating reproducible documents combining text and executable R-code.
  • Specify chunk options to control formatting of the output document

Processing JSON data (Optional)


  • JSON is a popular data format for transferring data used by a great many Web based APIs
  • The complex structure of a JSON document means that it cannot easily be ‘flattened’ into tabular data
  • We can use R code to extract values of interest and place them in a csv file

Text as data (Optional)Using Word Frequencies to Analyse Text


  • Use the quanteda package to analyse text data.
  • Use corpus(), tokens(),dfm(), dfm_remove() and stopword lists to prepare text for analysis.
  • Use textstat_frequency to investigate the most frequently used tokens or features in a dfm.
  • Plot frequencies using ggplot and the quanteda function textplot_wordcloud.