Before we Start
- Use RStudio to write and run R programs.
- Use
install.packages()
to install packages (libraries).
Introduction to R
- Access individual values by location using
[]
. - Access arbitrary sets of data using
[c(...)]
. - Use logical operations and logical vectors to access subsets of data.
Starting with Data
- Use read_csv to read tabular data in R.
Data Wrangling with dplyr
- Use the
dplyr
package to manipulate dataframes. - Use
select()
to choose variables from a dataframe. - Use
filter()
to choose data based on values. - Use
group_by()
andsummarize()
to work with subsets of data. - Use
mutate()
to create new variables.
Data Wrangling with tidyr
- Use the
tidyr
package to change the layout of data frames. - Use
pivot_wider()
to go from long to wide format. - Use
pivot_longer()
to go from wide to long format.
Data Visualisation with ggplot2
-
ggplot2
is a flexible and useful tool for creating plots in R. - The data set and coordinate system can be defined using the
ggplot
function. - Additional layers, including geoms, are added using the
+
operator. - Boxplots are useful for visualizing the distribution of a continuous variable.
- Barplots are useful for visualizing categorical data.
- Faceting allows you to generate multiple plots based on a categorical variable.
Getting started with R Markdown (Optional)
- R Markdown is a useful language for creating reproducible documents combining text and executable R-code.
- Specify chunk options to control formatting of the output document
Processing JSON data (Optional)
- JSON is a popular data format for transferring data used by a great many Web based APIs
- The complex structure of a JSON document means that it cannot easily be ‘flattened’ into tabular data
- We can use R code to extract values of interest and place them in a csv file
Text as data (Optional)Using Word Frequencies to Analyse Text
- Use the
quanteda
package to analyse text data. - Use
corpus()
,tokens()
,dfm()
,dfm_remove()
and stopword lists to prepare text for analysis. - Use
textstat_frequency
to investigate the most frequently used tokens or features in a dfm. - Plot frequencies using
ggplot
and thequanteda
functiontextplot_wordcloud
.