Inspecting the English Premier League Player Stats with R

Being a soccer person and programmer, I wanted to inspect player statistics for myself.  I finally found this excellent site for many leagues and primarily with player stats:  So, seeing that there was no download link, I determined to tediously copy/paste all the records for each player, for defensive, offensive, passing and summary categories, for last season, into four files (epl-player-stats-defensive-2015-16, epl-player-stats-offensive-2015-16, epl-player-stats-passing-2015-16 and epl-player-stats-summary-2015-16). Oof!  All along the way, I thought about making little a web-page scraper…  But in the end, I had my raw data.  Here is the head of defense:

So I pressed on with the next task: Turning this into something that R can read without any pain.

Enter perl:

Nothing tricky at all.  Just read-in the file – three lines per row – and tab-separate the fields.  With that in place, I say this on the command-line:

With those R-friendly processed files, I can now open R, import and explore the data.  First, the importing:

Next, the exploring:

With SQL statements:

Better than average Forwards:

Goals by Field position:

And here is a nice way of seeing goals by field position:








scatterplot-matrixOk.  Let’s try to spot any strong relationships:

Hmm. Tackles x Interceptions looks like a sort-of linear relationship.  So does Clearances x Blocks.


passesOk. Time for some visualization.  Here is average passes per game (AvgP) by pass completion percentage (Pass):





age-wiskersWhat about ages by field position?


fouls-x-yellowsThere should be a relationship between yellow cards and fouls:

We could go on and on, slicing, dicing and visualizing, but these are the tools that I reach for initially, to explore data.