Visualization of Led Zeppelin Lyrics with R

What can be known about Led Zeppelin lyrics from the standpoint of a computer geek?

First, I collected and properly named every song with lyrics, with the power of perl and persistence.

Then I found/crafted some R code to process these into a few graphics.

Here is the code for the first graph:


And here is that “word cloud” graphic showing the most spoken words.  And what is the most for Led Zeppelin?  It’s “baby” of course!





Here are two emotional sentiment charts.  And what is “emotional sentiment?”  Well, basically it is the assignment of a score to a word based on whether it is positive, neutral or negative.  How is this known?  Well, a bunch of people (university students, I think) tediously categorized huge word lists as to whether they made them feel good bad or indifferent.  And it is these scored word lists that are used in the creation of the following charts.

Here is the code to generate the first graph below:


The first shows the averaged positive, neutral and negative sentiment for each song, in order of their release and album play position.


We can see that every album has positive and negative moods.



This bit of code, given the previous program, shows that most positive song is “Bron-Y-Aur Stomp” on III. (The other identical peak on Presence turned out to be an accidental duplicate! Oops.)  The most negative song is “No Quarter” from Houses of the Holy.


The second is a line chart showing the amount of positive (red) and negative (blue) emotional sentiment for each song.

We can see that Robert Plant sings slightly more positive words throughout his time with the band.




Here is the code for that second graph:


We can inspect the relative amounts of different emotional categories too!  These are anger, anticipation, disgust, fear, joy, sadness, surprise and trust.

Anticipation, joy and trust are outstanding feelings in Led Zeppelin songs, according to this chart.






Finally, I wanted to see the basic stats of each song, from my fathom program.  In tabular format this shows that lyrics have very high and skewed readability scores.  This might be because songs are generally not punctuated, so the sentence complexity is not useful.  Anyway, here is an example of this output:

But I am curious about the words per song over time.  So I whipped up a bit more R:

We can see that Robert wrote slightly more words per song over his career with Zeppelin.  Notice that the last album, Coda is full of songs from earlier days with shorter songs.  This pulls down the trend line on the right end.



Since we are here, what is the song with the most words?


What about the richness of the language (“lexical diversity”) used?  Does that change over time?