Linguistic Analysis of the State of the Union Addresses

This weekend I harvested 231 State of the Union addresses up to 2017 and put them through NLP processing.

Here are the unigram TF-IDF values, generated with this code of mine, in context of all other addresses (full output – sotu-1-gram).  Each file is named with “YYYYMMDD-Name” format.

And here are the bigrams (full output – sotu-2-gram):

The lexical diversity is shown in the following graph:

 

The reading level has steadily declined, as shown in this graph:

 

~

And here are two excellent sites with their own analysis: http://www.presidency.ucsb.edu/sou.php & http://stateoftheunion.onetwothree.net/

Comments

*