Bach Chorale Diversity

What is the relative statistical diversity of Bach chorale harmonisations?

Ever since I stumbled upon this Bach Choral Harmony Data Set I’ve been wanting ways to analyze it!

So I set out to do just that today with the Shannon Diversity Index with this handy perl code:

First off (above), is the standard perl preamble followed by loading of the modules we will use.  Next, the data file is read in and parsed into an index of the interesting bits:

This routine, read_bach(), takes a raw comma separated value file of note, chord and bass structure, line by line and finds how many times a note group is seen, for each tune.

After reading and parsing the Bach data, all that remains is to compute the Shannon diversity index for each tune.

This above code considers the values of the index ids – that is the number of times a note group was seen in each tune.  This is handed to the stats module to compute the diversity.

And what are the earth shattering conclusions??

Well, the complete set of 60 tunes is more diverse than any individual tune – that makes perfect sense.  Total index = 4.53 and evenness = 0.78.

The individual tunes index summary statistics are these:

Ok. My inner geek is satisfied for the moment…

… Of course, the Inner-geek is never satisfied without visualization.  After searching a bit, I found the Whittaker Rank abundance curve.  Naturally I had to make one with this Bach data.

As before, we invoke perl and read-in the Bach raw data file – as a parsed index of how many times each note group has been seen:

Next up, instead of calculating the statistical diversity, we set the proportional frequencies of the data:

And output those totals in a reverse sorted list for R to consume:

That is, run this in the shell (but with your own jsbach data file of course):

Then open up R and:

And voilà:

Again, Earth shattering!

We can see that Bach definitely uses a lot of note groups (i.e. chords and arpeggios) – more than 300!  But the vast majority of these are in the top 50.



Here is the density plot of all Bach chord groups:








And here is an animated version of the densities of all the 60 chorales: