I am starting a new thing on the blog, an interview series of sorts, where I reach out to people I have always wanted to have a conversation with. I don’t expect to hear back from Barack Obama or Derek Jeter, but there are a lot of interesting people out there with good stories to tell. I hope to pick their brain and learn some stuff about life. I am excited to share their stories with you.
Baseball-Reference is the best source for baseball statistics on the internet. It has stats on every player, full box scores going back to 1919, and an amazing tool called Play Index where you can, like, search for anything that has ever happened in baseball. You could find the all time leaders for triples by a switch hitter that was taller than six feet from the San Diego Padres between the years 1976 and 1999. Or the top 5 leaders in sacrifice flies by infielders that are still alive and played before 1980. I don’t know why you would search for those things, but you could.
As a proud baseball nerd, Baseball Reference is one of my favorite things about the internet. So much information is available in an instant. I try not to take it for granted. Sean is the man behind it all, and I’ve always admired his work and the dedication he puts toward the site.
I am pleased to welcome Sean to the blog.
What is your background and how did Baseball Reference begin?
Well, I grew up in Iowa, and I’ve always been into statistics. My dad was a football coach, and in junior high I would keep track of charts for him and compile the statistics after the games. It’s always been an interest of mine. And then in graduate school, right at the start of the internet, I got into web design. I had the Baseball Encyclopedia text edition and thought it would translate well to the web. You could look at Rogers Hornsby’s stats and then go quickly onto his teammates and look at their stats. So I thought the web would be a much more useful format than a large reference book.
At the time, there was a database out there with baseball stats called The Baseball Archive, and I used that to build the initial site. We launched in March 2000.
I’m the type of guy that can get lost on the site for hours on any given night. Like you, I’ve always been drawn to numbers. And, I think the reason I’ve always been drawn to baseball is because of how much numbers and math intertwine with the game.
Oh yeah. Numbers have always been interesting to me, and it’s just grown over time. I didn’t grow up reading Bill James, but every Sunday I would look at the newspaper and check out the batting leaders and the pitching leaders. This was in the early 1980’s when newspapers were really the only place you could get the box scores or stats for players.
I also have an interest in presenting data in useful ways. So part of it is analyzing the numbers, but part of it is also wanting to create as useful an interface as I can.
How many employees do you have?
Across all the sites, we have five, including myself.
How much of the site is automated, and how much do you update manually?
It is very automated. For instance, today I haven’t done anything to update the site. The stats come in each morning and are parsed in an automated way, and then the pages all run automatically. If everything works right, I get a text message that says it’s updated. Because we are a small workforce, we aren’t watching the games, we aren’t entering the data ourselves. We just write programs that can do it.
There are a few things we do manually, often when we’re dealing with historical datasets. In those cases, there is going to be a certain amount of clean-up and correction. But usually what we do has already been produced by someone else who really enjoys baseball or their company does it. I think one of the things we do well is merge a wide variety of sources of data together into a cohesive whole.
How many visits does Baseball Reference get each day?
We’re over a quarter million per day. On a good day, we’ll get 300,000 or even 400,000. Our biggest day of the year is usually the trade deadline.
What are the stats you look at to evaluate a player?
WAR is a stat that we have been very involved in producing and generating, and I think it gives the best overall view of player value.
For hitters, I also look at on base percentage, OPS+, things like that. On the pitching side, I tend to stick with ERA and WHIP and strikeouts per nine innings. I know what FIP is, and we just added it to the site this year, but I prefer the others.
You mentioned Wins Above Replacement. I love the stat, but I think a lot of people don’t understand it, because it’s not intuitive. You can’t calculate it in your head. And I think some people are wary because it doesn’t have a consistent formula, and it can change over time.
About a year ago, I wrote a fairly long post on our blog that compared WAR to gross domestic product. Economists look at those numbers to determine whether we are in a recession, but you can’t possibly compute it yourself. They don’t know exactly what the formulas are that go into it. And like WAR, GDP gets adjusted retroactively on a fairly regular basis.
I understand the frustration with feeling that WAR is not very transparent. We try to show all of the components, and we have a very detailed tutorial on how it is calculated. But obviously, it isn’t something that you’ll be able to calculate on your own.
I do think it is the best all-inclusive number. You can make the case that Mike Trout is better than Miguel Cabrera without reverting to WAR, but it can show you exactly how much better he might be.
In college, my friends and I would sit around and try to name the most random baseball players we could think of from the late 90s and early 2000s. Some examples: Rick Helling, Desi Relaford, Jay Witasick, David Segui. Any other suggestions?
Let’s see … Arquimedez Pozo. The problem is that if you think of them they’re not necessarily random. Wayne Gomes would be a good one. Brook Fordyce.
Occasionally I’ll come across guys and I’ll have, like, no recollection of the player. He’ll have played for four years from 2001 to 2005. You have about 100 star players, and then you have the vast majority of everybody else.
You’ve been on MLB Network, you’ve interacted with some high profile people in the game, you have a few thousand followers on Twitter. It’s not exactly Bill James level, but you do have, I guess, some semblance of fame. People in the game, certainly many fans, know who you are. Do you like that?
You know, last night I got to go to a game with the Phillies’ head analytics guy, which obviously I wouldn’t have been able to do without the site. It’s fun to get to talk about baseball with those kind of people.
It’s funny, when I was in graduate school, my wife was a better mathematician than I was. She had written more papers, she finished her dissertation much more easily than I did. And one time went to a conference, and people came up and started talking to me, and she’d be like, why are they talking to him? And it’s invariably because of some baseball stuff that they had heard about. So we have a running joke that I have name recognition within a very small, dedicated group.
It’s fun – I sponsored my son’s little league baseball team this year, so they had baseball-reference.com on the back of their jerseys. And one of the parents came up to me later in the season and told me that their son had gone to a Baltimore Orioles game wearing that shirt. And he got several comments from people in the stadium.
It is very gratifying to know that people enjoy the work that you do, that they rely upon the work that you do. It’s certainly satisfying to know that you’ve done a good job and people want to use it.
What’s been the coolest moment for you since you started the site? And, have you ever had an ‘oh crap’ type of moment?
Oh, I’ve had many oh crap moments. I have a bad habit of trying to edit things live on the website and occasionally breaking it. It’s a bad feeling when you know that people are coming to the site looking for, let’s say, Derek Jeter’s splits and can’t get them because you broke something.
Probably the coolest moment I’ve had was a few years ago when Zack Greinke hit Carlos Quentin. Quentin charged the mound and started a brawl. And they were discussing it afterwards, and Vin Scully of all people said: well, you better get on Baseball-Reference and see if Greinke has ever hit Quentin before. To know that Vin Scully uses your site is probably the apex of what I’ve done. We might as well just shut it down because we’re never going to top that.
5 rapid fire questions:
Favorite type of music: My favorite singer is Greg Brown, who was living in Iowa City when I was there.
Last book you read: R Graphics Cookbook
3 favorite baseball players of all time: Rickey Henderson, Pedro Martinez, Wade Boggs
Where do you get your news: I still subscribe to a newspaper. I obviously look at websites as well, but I still read the paper.
Last time you kept score at a baseball game: Probably four or five years ago
And, finally, what are your long-term goals for the site? What would you like to add over the next five, ten, twenty years?
One additional thing that is coming along is the tracking data that MLB collects. We’ll see whether that is available to the public or not. Beyond that, getting a grasp of people’s mobile use of the site, and making it as convenient as possible for people that are looking at it on a tiny screen.
I don’t think there are any big statistical additions or analytical additions. We’ve already hit a lot of the high points. More generally for the company, we’re looking at expanding into soccer and other new sites as well.