Correspondence Analysis of Mexican Discourses
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Correspondence Analysis
Correspondence analysis is a multivariate statistical technique that summarizes a set of categorical data in a two dimensional form. It’s like the equivalent of Principal Component Analysis but for categorical data.
Correspondence analysis is usually applied to contigency tables. In this post, we will apply it to a frequency matrix (term document matrix from bag of words representation).
The analysis can be done by row or by column. Below is an implementation of correspondence analysis, where row and column analysis are done at the same time.
correspondence <- function(ct, ind){ #Parameters #ct : contingency table (or frequency table) #ind: which eigenvectors (first eigenvector is ommited) n <- sum(ct) rows <- nrow(ct) cols <- ncol(ct) #Correspondence Matrix F_fisher<-(ct)/n #Relative frequencies rtot<-(apply(ct,1,sum))/n ctot<-apply(ct,2,sum)/n Dr<-diag(rtot) Dc<-diag(ctot) Z<-(sqrt(solve(Dr)))%*%F_fisher%*%(sqrt(solve(Dc))) #Eigenvalues and eigenvector are obtained with SVD dvalsing<-svd(Z) #Two dimensional representation #Row analysis Cr<-(sqrt(solve(Dr)))%*%Z%*%dvalsing$v[,ind] #Column analysis Cc<-(sqrt(solve(Dc)))%*%t(Z)%*%dvalsing$u[,ind] return(list("Cr" = Cr, "Cc" = Cc)) }
Mexican discourses
In this post we will analize the discourses of mexican politicians, in particular, candidates for Mexico presidency. We have 11 discourses in total:
- Roberto Madrazo Pintado (PRI 2006)
- Andres Manuel Lopez Obrador (PRD 2006) (PRD 2012) (MORENA 2018)
- Enrique Peña Nieto (PRI 2012 before and after being elected)
- Josefina Vazquez Mota (PAN 2012)
- Felipe Calderon (PAN 2006)
- Ricardo Anaya Cortes (PAN 2018)
- Jose Antonio Meade Kuribreña (PRI 2018)
- Margarita Ester Zavala Gomez del Campo (Independiente 2018)
Our objective is to find patterns in the two dimensional of the discourses, that reflect information of the actual Mexico context regarding politics.
Putting it all together
We will use the bag of words representation for the discourses. The most frequent 500 words will be chosen for the analysis, and our final term document matrix will be a 11 x 500 matrix.
Next, we see the results of the correspondence analysis appplied to our term document matrix:
Insights
We can see that Ricardo Anaya and Roberto Madrazo are the furthest. That means in this context that they use words in their discourses that the other candidates don’t use frequently.
The three discourses from Andres Manuel are near from each other, and that was expected. And Margarita Zavala is close to Josefina Vazquez Mota. That makes sense, as their campaings are based on the idea of a woman in the presidency, so it’s logical that they use similar words in their discourses.
Another interesting insight is the closeness between Felipe Calderon and Margarita Zavala. It turns out that the team that helped Zavala in her campaign were former collaborators of Felipe Calderon, so maybe she was advised in the same way that Calderon. Check this new here.
The final insight was the closeness between Margarita Zavala and Jose Antonio Meade. Recently, Zavala has resigned from her candidacy, and, surprisingly, Jorge Camacho (former campaign chief from Zavala campaign) has anounced that he intends to vote for Meade. Perhaps he intends to vote for the candidate with the most similar ideas, and that would explain the closeness in our analysis. Check this new here.
Final thoughts
Correspondence analysis has proven to be useful in finding patterns on frequencey matrices. We saw how some of the political news can be reflected in a discourse analysis. For future work, we can use MDS in the term frequency matrix to obtain “data points” and train a classificator! But correspondence analysis is good for a initial representation.
Discourses
Discourses obtained from animalpolitico.com
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.