I am more comfortable with R, and I want to switch to Python. Python is a more robust programming language, and I think will be the future of data analysis. So I wanted to get some practice using Python. My project this week was to take Albert Pujols’ 2009 MVP season. Add a column for his wOBA. Graph it as a scatterplot. Then add two moving averages on wOBA. My hope was to learn more about pandas and plotly, and also see how wOBA can change over a season.
My first step was to create the wOBA data. Here is the wOBA formula:
wOBA = (0.690×uBB + 0.722×HBP + 0.888×1B + 1.271×2B + 1.616×3B +2.101×HR) / (AB + BB – IBB + SF + HBP)
You can read more about it here.
singles = pujols['H'] - pujols['2B'] - pujols['3B'] - pujols['HR'] pujols['1B'] = singles wOBA = (0.707*pujols['BB'] + 0.737*pujols['HBP'] + 0.985*pujols['1B'] + 1.258*pujols['2B'] + 1.585*pujols['3B'] + 2.023*pujols['HR']) / (pujols['PA']) #(pujols['AB'] + pujols['BB'] - pujols['IBB'] + pujols['SF'] + pujols['HBP']) wOBA.mean()
I had to create a 1B variable because the game log data I got only had Hits. Within that doubles, triples, and home runs are combined in that number. So I subtracted all extra-base hits from the Hits number to create the 1B column. I then did the calculations and appended the column to the data frame. The next task was to graph it.
pujols['50MA'] = pujols.rolling(window=50)['wOBA'].mean() pujols['10MA'] = pujols.rolling(window=10)['wOBA'].mean() trace1 = go.Scatter( x = pujols['Date'], y = pujols['wOBA'], mode = 'markers', name = 'wOBA') ma50 = go.Scatter( x = pujols['Date'], y = pujols['50MA'], name = '50-Day Moving Average') ma10 = go.Scatter( x = pujols['Date'], y = pujols['10MA'], name = '10-Day Moving Average') layout = { 'shapes': [ { 'type': 'line', 'y0': 0.4740692261904762, 'x0': 'Apr 6', 'y1': 0.4740692261904762, 'x1': 'Oct 3', 'line': { 'color': 'rgb(0, 0, 0)', 'width': 1, 'dash': 'dashdot', },} ] } fig = { 'data': data, 'layout': layout, } data = [trace1, ma50, ma10] pt.iplot(fig, filename = 'scatter-mode')
I decided to use plotly to graph the data. I’m not sure why, but it has a good reputation and I thought I would use it. I created the 10-game and 50-game moving averages and appended them to the data frame as well. I then created the three objects to plot. wOBA as a scatter plot, and the moving averages as lines. I also created a horizontal line to show the season average for his wOBA. The horizontal line was interesting because I had to make a start and an end on the x-axis. I couldn’t just put a y-coordinate. the nice thing was I could just input the dates as they were read. No need to convert or deal with date variable types. Thank god.
Here’s the graph:
The blue dots are all the wOBA for each game he played in the 2009 season. The orange line is the 50-game moving average and the green line is the 10-Day moving average. I also added a black dashed line to show the season average.
I wanted to look at the moving averages as I thought it could tell a bit of a story about the season he had. Did he have any slumps? Any hot streaks? I thought the 10-Moving average would indicate that. And to some extent it does.
We can see that July was a below average month for Pujols. Also, he finishes the season strong by having an above average end of August into early September.
I was somewhat surprised at how quickly it averages out. the 50-moving average doesn’t deviate far from his season average. I had not expected the two averages to consolidate so quickly.
Anyways, this was some great practice with Python and Plotly. And it was interesting to see how quickly his average wOBA consolidated around his season average.