Linguistic Analysis of the State of the Union Addresses

This weekend I harvested 231 State of the Union addresses up to 2017 and put them through NLP processing.

Here are the unigram TF-IDF values, generated with this code of mine, in context of all other addresses (full output – sotu-1-gram).  Each file is named with “YYYYMMDD-Name” format.

And here are the bigrams (full output – sotu-2-gram):

The lexical diversity is shown in the following graph:


The reading level has steadily declined, as shown in this graph:



And here are two excellent sites with their own analysis: &