Computational approaches in discourse analysis III

Marion Walton

Outline

What is discourse?
Computational linguistics overview
Discourse analysis & Corpus linguistics
Automating content analysis in Media studies
Multimodal discourses

Quiz

Multiple Choice Questions
Take-home, Randomized

Question types:

Based on readings and lectures
Use Wordtree and Voyant tools to answer questions about Mandela speech and Clicks Corpus.
Apply concepts from readings to examples from Clicks corpus

Starting points - Concordance

Compare “controversial advert” to “racist advert” in the Clicks Video Descriptions in Voyant Tools - with Stop List
Compare “controversial” to “racist” in the Clicks Comments (Random sample) in Voyant Tools - with Stop List
Compare concordances of “beautiful” and “ugly” and “dark” and “light” in the Clicks Comments (Random sample) in Voyant Tools - with Stop List

Lecture 3 - Machine learning and AI

ML & Clicks Video Descriptions

Framing

“Framing essentially involves selection and salience. To frame is to select some aspects of a perceived reality and make them more salient in a communicating text, in such a way as to promote a particular problem, definition, causal interpretation, moral evaluation, and/or treatment recommendation for the item described.” (Entman, 1993:52)

Machine learning and frame analysis

Co-occurrence of words interpreted as a frame using statistical techniques like cluster analysis.
Cluster analysis - organising items into groups, or clusters, on the basis of how closely associated they are.
Co-occurrences of words graphically visualized as networks of words

ML & Clicks Comments

Comments - Collocations

Top Collocations: Clicks Comments (n=35)
	collocation	count
2	black people	135
1	black women	106
3	white people	102
7	natural hair	48
9	black woman	45
4	fine flat	41
145	black hair	39
13	flat hair	35
5	dry damaged	28
15	can use	28
6	face cream	25
8	go back	25
27	black person	24
21	straight hair	22
49	can get	21
22	aneeza gold	20
76	white women	20
64	damaged hair	17
10	political party	16
11	thank ninja	16

n-grams

Continuous sequence of “n” terms - often used in predictive text, spell-checking, language modeling etc. Also useful for discourse analysis

          feature frequency rank docfreq group
48         racist       271   48     217   all
60         racism       208   60     167   all
392        racial        38  388      34   all
586      a_racist        27  566      27   all
607     of_racism        26  595      23   all
656     racism_is        25  622      25   all
665       racists        24  657      23   all
744    not_racist        22  729      22   all
938     racism_in        18  909      18   all
957     be_racist        18  909      15   all
1159    is_racist        15 1092      14   all
1259   racist_and        14 1165      13   all
1294   the_racist        13 1268      13   all
1436    is_racism        12 1402      10   all
1498   are_racist        12 1402      11   all
1713 being_racist        11 1551      11   all
1714   racism_and        11 1551      10   all
1982    to_racism         9 1936       8   all
2029 about_racism         9 1936       9   all
2046   the_racism         9 1936       8   all

                      feature frequency  rank docfreq group
744                not_racist        22   729      22   all
3732           was_not_racist         5  3642       5   all
4389               not_racism         5  3642       5   all
4469            is_not_racist         5  3642       5   all
6452            not_racist_it         3  6234       3   all
6457        ad_was_not_racist         3  6234       3   all
10014       was_not_racist_it         2  9552       2   all
10015       not_racist_it_was         2  9552       2   all
15642          not_racism_the         2  9552       2   all
15758          but_not_racist         2  9552       2   all
16573      totally_not_racist         2  9552       2   all
16574          not_racist_its         2  9552       2   all
16603   is_totally_not_racist         2  9552       2   all
18236         not_racist_they         2  9552       2   all
18334          its_not_racist         2  9552       2   all
19044           not_racist_at         2  9552       2   all
19046       not_racist_at_all         2  9552       2   all
19155           not_racist_to         2  9552       2   all
61441     truthful_not_racist         1 21181       1   all
61444 was_truthful_not_racist         1 21181       1   all

Topic Modeling - Comments

      Topic 1  Topic 2     Topic 3     Topic 4   Topic 5  Topic 6  Topic 7 
 [1,] "clicks" "🤣"        "black"     "racist"  "just"   "can"    "hair"  
 [2,] "eff"    "go"        "people"    "us"      "like"   "one"    "white" 
 [3,] "must"   "know"      "think"     "country" "get"    "use"    "ad"    
 [4,] "right"  "say"       "women"     "time"    "need"   "also"   "even"  
 [5,] "take"   "u"         "😂"        "people"  "want"   "yes"    "look"  
 [6,] "malema" "see"       "racism"    "eff"     "love"   "good"   "make"  
 [7,] "thing"  "said"      "person"    "never"   "way"    "skin"   "fine"  
 [8,] "matter" "really"    "beautiful" "racism"  "going"  "face"   "saying"
 [9,] "stupid" "something" "white"     "now"     "well"   "please" "flat"  
[10,] "stores" "back"      "many"      "race"    "better" "thank"  "racist"

Topic Modeling

In NLP, topic modeling applies unsupervised learning on a corpus to produce a summary sets of terms representing the collection’s overall primary set of topics.

Topic Modeling

Used to identify topics present in a corpus.

LDA algorithm identifies co-occurrence patterns of words and latent structure of the text

LDA assumes: - each doc is a mixture of topics - each topic has characteristic word distribution

Why Use Topic Modeling

Exploratory work on large dataset
Summarise key themes
Reduces complexity and size of dataset (dimensionality reduction)
Faster Information Retrieval - find by themes not keyword matches

Goals of ML

The goal of machine learning is to:

make accurate predictions.
use large datasets
use complex models which recognise nonlinear relationships between several variables.

Overview - Automated Content Analysis

Boumans & Trilling, 2016

Boumans, J. W., & Trilling, D. (2016). Taking stock of the toolkit: an overview of relevant automated content analysis approaches and techniques for digital journalism scholars. Digital Journalism, 4(1), 8-23. https://doi.org/10.1080/21670811.2015.1096598

Approaches

Simple automation

Dictionary approaches (e.g. sentiment analysis)

specifies explicit rules
best for coding manifest data
sentiments are contextual - does not always travel well

“Sentiment” in Clicks transcripts

Supervised machine learning

Model "learns" from decisions by human coders.

Makes more efficient use of human work.
Classifiers can be re-used and allow for faster response (e.g., Hopkins and King 2010; Jurka et al. 2013).
Inter-cultural generalisability?

Unsupervised machine learning

Unsupervised machine learning helps to describe discourses, frames or topics in an open way - doesn’t impose prior assumptions
Similar to qualitative methods
Difficult to audit
Can replicate bias

Strengths

Flexibility & scale
Classification tasks
Reality-based tasks with right and wrong answers

Weaknesses

Culturally constructed categories
Potential linguistic/culture/race/gender biases
Unintended uses
Variability & difficulties with auditing

Conclusion

CDA & CL

Interaction and synergy (Baker et al)

Corpus linguistics can expand the scope of discourse analysis:

Techniques provided a “map” of the corpus, pinpointed areas for subsequent close analysis
Found examples and quantified them through absolute and relative frequencies
Lexical patterns, keywords, clusters, collocates revealed novel patterns of use.
“it can reinforce, refute or revise a researcher’s intuition and show them why and how much their suspicions were grounded. (Partington 200;3:12)

Natural Language Processing

Sentiment analysis - major limitations
Topic modeling less useful for this small, focused (& reflexive) corpus
Sociograms particularly useful as they recorded contours of discourse practices

Importance of Critical Discourse Analysis

Key to address Northern/Anglocentric biases in tools
Important skillset for auditing and adapting tools and training datasets

Computational approaches in discourse analysis III

Outline

Quiz

Starting points - Concordance

Lecture 3 - Machine learning and AI

ML & Clicks Video Descriptions

Framing

Machine learning and frame analysis

ML & Clicks Comments

Comments - Collocations

n-grams

Topic Modeling - Comments

Topic Modeling

Topic Modeling

Why Use Topic Modeling

Goals of ML

Overview - Automated Content Analysis

Approaches

Simple automation

“Sentiment” in Clicks transcripts

Supervised machine learning

Unsupervised machine learning

Strengths

Weaknesses

Further reading

Conclusion

CDA & CL

Natural Language Processing

Importance of Critical Discourse Analysis