Why R is Better Than Excel for Fantasy Football (and most other) Data Analysis
30Many articles have been written on why R is better than Excel for data analysis. In this post, I will summarize the reasons why R is advantageous in most data analysis circumstances, with a focus on fantasy football analysis. Although this post focuses on the benefits of using R for fantasy football analysis, the benefits of R extend to many other domains (e.g., finance, time series, machine learning, social sciences; for a list of different domains of R packages, see here).
What is R?
R is a software package and language for statistical analysis and graphics.
When/Why R is Better than Excel
- Data Manipulation. R allows you to manipulate (e.g., subset, recode, merge) data quickly. Some R packages have been designed specifically for these purposes, e.g., plyr. Typically, a majority of the time spent on an analysis project is spent before the analysis—preparing the data. R is much more adept and efficient in data preparation than Excel. Fantasy data scraped from websites often require many steps in data processing to be ready for analysis, so R is ideal.
- Easier Automation. R uses a scripting language rather than a GUI, so it’s much easier to automate things in R than in Excel. This can save you loads of time, especially when you plan to re-run the same analysis multiple times (e.g., every new fantasy season).
- Faster Computation. Because of the automation provided by R scripts, many operations are much faster to perform in R than Excel.
- It Reads Any Type of Data. R can basically read any type of data (.txt, .csv, .dat, etc.). There are also R packages specifically designed to read JSON, SPSS, Excel, SAS, and STATA files. You can also scrape data from websites and execute SQL queries. Scraping websites can be useful for downloading fantasy projections from ESPN and other websites for data analysis.
- Easier Project Organization. In Excel, projects are often organized in different tabs of the same file. This can make the Excel file slow, clunky, and difficult to navigate. It is easier to keep a project organized when dealing with R scripts because different tasks or sub-projects can be stored in separate files stored in the same folder and linked together in the same project with RStudio. For an example folder structure for R projects, see here.
- It Supports Larger Data Sets. Excel has restrictions for how large your data can be. Even if if your data don’t exceed this maximum size, Excel can become slow with large data sets (especially after you add tabs, formulas, and references). R supports larger data, and can support big data with packages such as Hadoop.
- Reproducibility. R has features that make it much easier to reproduce the findings of your analysis, which is important for detecting errors. First, it’s easy to add comments to your scripts to make it clear what you’re doing. Commenting your code is crucial, and can serve as a translation for someone else looking at your code, or as a reminder of what you did 6 months ago! It is difficult to document steps you’ve done in Excel. Second, data and analysis are separated in R, allowing you to see the logical progression for data analysis in the R code. In Excel, however, data and formulas are together, and it can be difficult to follow the data analyst’s train of logic. Third, you can use version control with git a) to track (and revert) changes you make over time and b) to share your scripts with others to collaborate on projects as a community. Having more people examining your work can help find and fix errors and make other improvemnts. Excel files are binary files, so you can’t track changes to Excel files. The github site hosting the R scripts for this site is located here. Feel free to use the scripts and suggest improvements!
- Accuracy. Researchers have shown that Excel and other spreadsheets show important inaccuracies for basic analyses like linear regression. R was specifically designed for statistical analysis, so it is more precise and accurate for data analysis.
- Easier to Find and Fix Errors. Because R uses scripting rather than clicking, and allows comments and version control, one can see a history of the actions taken to achieve the result. This makes it easier to find and troubleshoot errors. In Excel, however, errors can be hidden in formulas in cells that can be difficult to find. Spreadsheet errors have led to widely-publicized mistakes, including disastrous financial losses, faulty government policies, and the wrong drugs being given to cancer patients. Humans make mistakes and mistakes in data analysis are inevitable, whether with spreadsheets or with R code. The bottom line is that it’s easier to find and fix these mistakes in R than it is in Excel, making it more likely that you’re getting an accurate result in R.
- It’s Free. Enough said.
- It’s Open Source. Unlike Excel and other proprietary software used for data analysis, R is not a black box. You can examine the code for any function or computation you perform. You can even modify and improve these functions by changing the code.
- Advanced Statistics. R has many more (and more advanced) statistics capabilities than Excel does. They also tend to be faster and more flexible. Part of the advanced capabilities of R owes to the fact that R is open source and many users have contributed packages for performing specialized functions. For example, this fantasy football draft optimizer uses the Rglpk package to find your optimal starting lineup of players that maximizes the team’s projected points while minimizing its downside risk.
- State-of-the-Art Graphics. R has advanced graphics capabilities (see here for examples and code for how to create them). You can create beautiful graphics using the base R package, or with the lattice or ggplot packages. People like to digest and understand statistics visually, and R provides a better tool for creating pretty visualizations than Excel does.
- It Runs on Many Platforms. You can use R on Windows, Mac, Linux, and Unix.
- Anyone (Including You) Can Contribute Packages to the Community to Improve its Functionality. In the chance there isn’t an R package that does what you need to do, you can write a function to perform the task and can contribute it as a package to the community for others to use and improve. The number of R packages contributed to the community is increasing at a rapid rate. Chances are, if there’s an analysis you need to do, an R package exists to do it.
For these reasons,
- R is considered the top language for data science. Also see here.
- R usage is increasing exponentially.
- The number of R packages is increasing exponentially. Also see here.
- More data scientists use R than any other software for predictive modeling competitions (including 50% of winners) on Kaggle.
- An increasing number of jobs require experience with R.
- R skills attract the highest IT salaries.
- Many companies use R.
- Microsoft (the owner of Excel) bought Revolution Analytics (the leading business software implementation of R for data analysis).
When to Use Excel
- Data Entry. I don’t necessarily recommend Excel for data entry, but it does have some nice features including data validation. Data validation ensures that the entered data conform to user-specified requirements (e.g., must be integer between 0-5), and can reduce data entry errors. The problem with saving data in Excel (.xls/.xlsx) format is that these formats don’t cooperate well with other software, so if you plan to do any data manipulation or analysis with other software (e.g., R), it gets tricky. You might be able to save your data as comma-separated values (.csv), a more widely used format that can be read by most other data software. In most cases, however, using a simple text editor or relational database management software is best for data entry.
- Basic Calculations. Excel can be faster for very simple calculations (e.g., descriptive statistics) or data manipulation (e.g., basic sort, filter, pivot table). Anything beyond simple calculations and data manipulation, however, will likely be better in R.
- Viewing/Scrolling Data at a Glance. Excel can be handy for a quick visual overview of your data. Eyeballing and scrolling through the rows and columns gives you a hands-on way to quickly explore your data visually. There are ways to see your data in R, as well, but a spreadsheet-like view of your data is not presented by default.
- Data Presentation and Summarization. Excel can be helpful for presenting data that need to be summarized in an aesthetically-pleasing tabular form. For more advanced graphical summaries, however, R is likely better.
How to Download R
You can download R here: http://cran.r-project.org/. I strongly recommend the RStudio text editor for working with R scripts: http://www.rstudio.com/ide/download/.
How to Get Data from Excel into R
Although there are R packages for importing Excel data directly (see here, here, and here), it’s generally best to export the Excel data to a .csv file and then import the .csv file in R using the read.csv() function. Here’s the code for importing a .csv file in R and storing it in an object called ‘mydata’:
mydata <- read.csv("C:/excelData.csv")
Where Can I Learn How to Use R?
- Use this intro to R: http://www.statmethods.net/
- Here’s a guide to R for Excel users (and another one).
- Check out these resources for getting started with R: http://www.ats.ucla.edu/stat/
r/ - Watch these videos from Coursera: http://blog.revolutionanalytics.com/2012/12/coursera-videos.html
- Post to the R mailing list or forums if you have questions
- Read other blogs on R-bloggers
- Read this site!
Summary
In summary, R has many benefits over Excel and other data analysis software. If you’re using Excel for data analysis, give R a try. You’ll be thankful you did.
Examples Using R for Analyzing NFL Data
Win Your Fantasy Football Draft with This Draft Optimizer
Expected Points by Position Rank in Fantasy Football
Who Has the Best Fantasy Football Projections: ESPN, CBS, NFL.com, or FantasyPros?
Turning Advanced Statistics Into Fantasy Football Analysis
New York Times Uses R to Investigate NFL Draft Picks
Further Reading
http://www.michaelmilton.net/2010/01/26/when-to-use-excel-when-to-use-r/
http://www.burns-stat.com/first-step-towards-r-spreadsheets/
http://www.burns-stat.com/spreadsheet-r-vector/
http://www.burns-stat.com/documents/tutorials/spreadsheet-addiction/
http://blog.revolutionanalytics.com/2013/04/more-reasons-not-to-use-excel-for-modeling.html
http://blog.revolutionanalytics.com/2013/02/did-an-excel-error-bring-down-the-london-whale.html
http://robjhyndman.com/hyndsight/rvsexcel/
http://andrewgelman.com/2013/04/17/data-problems-coding-errors-what-can-be-done/
http://andrewgelman.com/2013/04/17/excel-bashing/
http://r-dir.com/blog/2013/11/r-vs-excel-for-data-analysis.html
http://www.quantumforest.com/2013/12/excel-fanaticism-and-r/
http://christophergandrud.blogspot.com/2013/04/reinhart-rogoff-everyone-makes-coding.html
http://r4stats.com/articles/popularity/
[…] You may read the rest at Fantasy Football Analytics. […]
I actually think that for anything with more than 5 rows of data nothing is better than a quick plot to eyeball problems. So I’d say, Excel loses on this one as well.
Cheers
Rob
Thanks for great post;)
Hey thanks for the link to my site. Also, great article. You’re thorough, encouraging & list additional resources. I’m biased towards R, but after using it for a couple of years I actually prefer it to Excel for just about everything. The only thing I use Excel for regularly these days is pivot tables, and the only reason I still use Excel for that is to make this work easy to share with others. Anyway, nicely done.
Thanks for the kind words, Chris. I tend to agree with your preference for using R over Excel for most things. I had to throw the Excel crowd a bone so they would be receptive to my suggestions for when to use R. We’re swimming against the current because so many people rely on Excel. The tides appear to be shifting though…
[…] The Fantasy Football Analytics blog shares these 14 reasons why R is better than Excel for data analysis: […]
Very good article and aptly summed up! R is sure much better for statistical analysis than Excel because I agree wholeheartedly to your statements! Only Excel is the most widely used tool in the business environment securely! That’s why I think it is the best to integrate R into Excel! 🙂
Good point, Günter. Integrating R into Excel is a good first step. I suspect as people become more familiar with R and as R becomes more widely used in undergraduate training, businesses will gravitate toward R because it will save them money and is less error-prone. In fact, it’s already happening. See here:
http://www.revolutionanalytics.com/companies-using-r
http://blog.revolutionanalytics.com/2014/05/companies-using-r-in-2014.html
R has a long way to go before it unseats Excel, but the trend is positive!
There is also another important issue: accuracy. Please check here http://www.jstatsoft.org/v34/i04 the problems one may run into when using Excel.
Thanks for the link, Alejandro. I added accuracy to the list of reasons!
I think that most important reason, which is not stated explicitly, but is hinted at is that with R the data and the analysis are separated. The biggest problem with Excel is that it is very difficult the follow the train of logic that leads to a particular result. You him to keep going back through formulas and you cant be sure on which cells each type of formula is used so you have no assurance that the analysis is correct or even consistent.
I think that Excel is a marvellous tool, but not for serious data analysis.
Excellent points, David. I added these observations to the section on Reproducibility. Thanks!
The problem with R is that it is command line interface and that is a huge hurdle for many Excel users to jump. Users don’t want to have to write a script to look at data they just want to click through a GUI, preferably a familiar one to get the results they want. I’ve used R for years and almost every time when someone asks how something is done, they suddenly loose interest when they see code.
As Excel is so widely used I agree with Gunter that integrating R into Excel can be a good compromise, the user has a nice familiar interface, gets some powerful tools and hopefully the motivation to learn more about the benefits of R.
George
Hey George,
Agreed that the command line interface is one of the biggest hurdles for R in overtaking Excel. As more undergraduate classes teach and use R for statistics, I imagine that its adoption will increase. It just may take some generational turnover for younger, R-trained employees to speed up the process of moving to R. Bottom line: we should place an emphasis on learning R in college statistics coursework.
-Isaac
Definitely agree with pretty much everything said.
Just wanted to add that as a data analyst, often you don’t get to choose the format of the data you get (you just get it from someone else who has already decided on .csv, .tsv, .xlsx, etc.).
One package I’ve found very useful for interfacing between excel files and the R environment is openxlsx (on CRAN: http://cran.r-project.org/web/packages/openxlsx/index.html).
Stefan
Good point, Stefan. I see that as another advantage of R—it can read many more data types than Excel can (including Excel files).
Any soccer fans? I used R for some fantasy football analysis on the English Premier League too – http://thecomeonman.blogspot.in/2014/08/fantasy-football-1-team-composition.html
Isaac,
Congratulation for an intersting article especially for the well detailed collection of links. I would be intersted to know if when you mention Excel you also includes VBA capabilities. If VBA is included I would be tempted to disagree with some of your points. Also it is well known that R has some memory limitations.
One point is clear : R is free.
But the costs of training staff are not small.
Excel and R are designated for diffrent usage. People able to write code can do that in any language: R, VBA, SAS, Python. Large companies are very reluctant to use open source applications. I think that a more appropriate comparaison would be between R and Python.
Regards,
Madalin
Hey Madalin,
Thanks for your thoughts. I wasn’t considering VBA capabilities in Excel, but I suspect most of the points would stand. Packages have been written for R that extend R to include many other languages, including VBA, C++, Python, MATLAB, etc., so R can essentially do everything Excel can and more.
I agree that Excel and R were designed for different uses—however, many people use Excel in situations when R would work better. Yes, R has memory limits, but its memory limits exceed those of Excel. Plus, there are enterprise distributions of R that vastly improve memory capacity (http://www.revolutionanalytics.com/revolution-r-enterprise).
Re: cost of training—Excel also required training costs before wider adoption. I suspect as people become more familiar with R and as R becomes more widely used in undergraduate training, businesses will gravitate toward R because it will save them money and is less error-prone. In fact, it’s already happening (including many large companies):
http://www.revolutionanalytics.com/companies-using-r
http://blog.revolutionanalytics.com/2014/05/companies-using-r-in-2014.html
As more undergraduate classes teach and use R for statistics, I imagine that its adoption will increase. It may take some generational turnover for younger, R-trained employees to speed up the process of moving to R. If we place an emphasis on learning R in college statistics coursework, it won’t be as much of a training cost for companies, and recently graduating employees would push to use R.
Unlike VBA and Python, R is a specialized programming language for statistical analysis. Python is great, and some people even use it for statistics. But, R was specifically designed for statistical analysis, so it has some of the most advanced statistical capabilities of any stats software (especially compared to common stats software used in industry like SAS, SPSS, and Stata).
Just some my thoughts. Thanks for your thoughts!
-Isaac
Hi Issac! Great note. Made my day. I bookmarked this page quick! I am transitioning to R as we speak. Any chance you can help em with something as basic as getting box office results from IMDB to a spreadsheet weekly? If so pls ping me a cikokwu @ gmail.
Hi Chuka, it looks like R-bloggers has some articles on IMDB:
http://www.r-bloggers.com/?s=imdb
For example:
http://www.r-bloggers.com/rvest-easy-web-scraping-with-r/
http://www.r-bloggers.com/migrating-table-oriented-web-scraping-code-to-rvest-wxpath-css-selector-examples/
http://www.r-bloggers.com/using-sentiment-analysis-to-predict-ratings-of-popular-tv-series/
http://www.r-bloggers.com/top-250-movies-at-imdb/
http://www.r-bloggers.com/predicting-movie-ratings-with-imdb-data-and-r/
Hope that helps!
-Isaac
Isaac, I put the task to you to help me really appreciate R because I really want to, and believe that R has a great deal of powerful features Excel may not, but I have yet to find many. I know Excel pretty much inside and out, and have yet to find tools from R that I can’t recreate with Excel with far less time that it takes to write code. In fact, this is especially interesting to me because I LOVE fantasy football and stats.
With any data analysis I do, I often rely on this sage lesson: “data scientists are ‘lazy’ – in a good way.” That is, we want to find the quickest way to the answer we need (and potentially that others need too). More often for me, it has been with Excel, but I want to be lured to R more.
It’s worth noting that I’m perfectly familiar with R and its interface, but maybe I’m just not using it to it’s full potential. So could you help me with more specifics about what I can’t do with Excel using 1) VBA (especially considering that you can often-times just simply “record” a reproducible code), 2) the Solver linear optimization feature included within Excel (which is excellent when you’re playing auction/salary cap leagues in FF), and 3) Pivot Tables? In addition to all of that, 4) why isn’t it better to be able to see the actual data table that you can with Excel? I just find it so much easier to build out new columns of data using formulas I’ve created from the fantasy stats data, using the Tables feature in Excel, which is only a pain in R.
I’ve used Excel in every which way with the World Bank’s World Development Indicators (http://datacatalog.worldbank.org/) an 80MB Excel file which comprises of 352,000 rows by 60 columns of information all through Excel, and it’s done miracles for my work in economics. And that is not a tiny little dataset.
Your assessment that Excel is better for the following seems terrible disingenuous:
1. Data Entry – this is like saying use Word instead, in which case you need not compare Excel to R
2. Basic Calculations – you and I know there are a lot more than just basic calculations (index/match, vlookup, sumifs, net present value, internal rate of return, you name it, not to mention functions you can build yourself with VBA and APIs you can also download/apply into Excel (e.g., Bloomberg Terminal).
3. Viewing/Scrolling Data at a Glance – That’s a big bonus if at times eye-balling might even be just as meaningful, especially when you have just 1 minute to make a draft selection in fantasy football and just need to eyeball who you might pick in the later rounds.
4. Data Presentation and Summarization – one can even argue R is better but they fulfill different needs.
Again, I write this less to compare, and more to be talked into R for the features I may not have yet found.
Cheers,
Daniel
Hi Daniel,
Thanks for your thoughts! I’m sorry you feel that analysis was “terrible disingenuous.” The post wasn’t meant to be a “general comparison” between R and Excel. Rather, it was meant to be a comparison of the tools used most commonly in the context of fantasy football data analysis. The principles may or may not apply to other domains of application. We agree that R and Excel are not always comparable because they can serve different functions. To the extent that R and Excel can be used for the same purposes, there are many reasons to prefer R (see reasons and cited evidence in the article above).
But don’t take my word for it! The fact remains that R is considered the top language for data science, and more data scientists use R than any other software for predictive modeling competitions on Kaggle (including 50% of winners; http://rconvert.com/conversion-switch-to-r-data/r-usage-exceeds-sas-sps/).
Excel is good software, and has unique tools that R does not have. Nevertheless, to the extent that R and Excel can be used for the same purposes, data scientists generally prefer R—and for good reasons.
Thanks,
Isaac
Hi Isaac,
Firstly, thanks for the thoughtful response! My challenge is often that I know Excel a great deal better than R since I use it so much more frequently at work, so I trust that I’ve not leveraged the full potential R has to offer. In fact, in the context of fantasy football, I think one area where R would be immensely helpful that I don’t think Excel could offer (to my better knowledge) is the ability to read webpage scripts. Something I’ve not been able to do is collect week-to-week stats on players’ and teams’ performance. I’ve tried collecting and compiling with Excel, but it has proven too cumbersome. The reason I want this is to help me understand performance volatility/consistency and against any given team. That is, do I want maybe a deeper ball player with more upside (but more downside) or do I want a consistent player with limited upside but who won’t surprise me week-to-week (e.g., AJ Green vs. Doug Baldwin come to mind).
I wonder if you have any thoughts or advice on that, because this I think would be a great project for me to take on to better learn R and get the stats I need. That in mind also, would you have any recommendations on which website might be best for me to leverage to compile something like this using R? I suppose your website is a good place to start if you have that in one place, but in case it doesn’t. (I just happened upon this blog after I got into a debate with a friend about Excel vs. R.)
Thanks Isaac!
Daniel
Hi Daniel,
Yes, we scrape all seasonal and weekly projections via R. For more info, see here:
https://fantasyfootballanalytics.net/2014/06/scraping-fantasy-football-projections.html
The scripts aren’t yet updated for the current season–stay tuned!
-Isaac
I’ve always done my daily NFL projections and solver in Excel. This season I would like to move everything into R. It was a great way for me to learn Excel last year, and I hope it’ll be a great way to learn R this season. Do you have any posts/resources for doing this all in R (e.g. creating NFL projections by combining various stats, running each projection through a solver one by one, and outputting each lineup into a list)?
Check out this post and the associated R code:
https://fantasyfootballanalytics.net/2013/08/optimization-simulation.html
well compiled article,great.
Not sure if this is just my computer or not but I feel like the snake draft app is down? The rest of your site works wonderfully
Hi Samson,
The apps are down at the moment due to heavy traffic during peak hours of peak draft season. We’re hoping to generate revenue to cover increased server capacity. In the meantime we recommend a) downloading the projections to use offline, and b) using the apps during non-peak hours.
Hope that helps,
Isaac