Data Science: A Day in the Life of a Veyo Data Scientist
May 19, 2016
“Remind me what it is you do again?”
That seems to be everyone’s favorite question. It’s usually followed up by, “Something to do with data, right?” In the last few years, Data Science has become a very hot topic. HBR called us sexy back in 2012, and McKinsey did it even earlier in 2009. But what do we really do? The easiest way to explain what we do is to tell people what we do. So let’s go through a day in the life of a data scientist.
But first off, let’s talk about the data. Veyo has a lot of data. We’re tracking cars, traffic, weather, trip requests, driver activity, and anything else that has to do with moving cars around to pick people up. But all that data isn’t useful unless you can get it to tell a story. A big part of our job is organizing the data to drive insights. Different time zones? Let’s convert them. Missing data? Let’s figure out what’s missing (and better yet, why it’s missing). Data wrangling (or data munging) is a daily task. Good insights come from good data.
So what are we trying to do with all this data? Currently, our main focus is on predicting what our supply and demand will look like in the future. A massive part of really reliable, high-quality transportation is making sure we have the right cars in the right places at the right times. Will we have enough cars to pick everyone up on Friday? How can we predict when someone will be finished their appointment and ready for pickup? What if it rains? What has worked best when we needed to get more cars out on the road? How do provider and passenger cancellations affect our models? Creating statistical models, visualizing data, finding patterns, testing theories…it’s all a part of data science. So what does a typical day look like?
9am(ish) – After grabbing a cup of coffee, it’s time to check up on the predictive models we ran last night. We’re currently working to predict supply models 48 hours in advance. It looks like it might rain this week, so we’re using recorded data to look into what happens if the average speed of drivers slows down by 5 mph. Once we’ve analyzed the results, we’ll send a report out to the operations team so they can plan for the expected change. Looks like they’ll need a few extra drivers with the rain coming.
10am – Daily stand up time. (Meetings are a lot shorter when you have to stand, so our daily meetings are run standing up.) They usually last 10-15 minutes and give us a chance to share what we did yesterday, what we’re doing today, and any issues that we’ve run into. On my agenda for the rest of today is a review of my latest project: running simulations to better determine fraud by looking at driver velocity vs time traveled. Did they really go ten miles in one minute? Or complete twenty trips in one hour? To help us find driver patterns and root out fraudulent activity we’ve been working on a way to cluster drivers based on their behavior. My list of items for today’s work will be around how I’ll be setting up those clusters.
10:15am – Back to my desk and another project. One of our current challenges is predicting if a trip will get cancelled in the future. Our supply team can make better decisions on how and when to dispatch a trip based on its chances of being cancelled. To predict trip cancellations, we look at the history and the contributing factors to a trip being cancelled. We can then calculate the likelihood of a trip being cancelled and use those predictions on future trips.
11:30am – Time to meet with the development team about the hotspot maps. (Hotspot maps allow our drivers to see where trips are occurring.) We’re implementing live updates so drivers always know the best place to start their shifts, but we first need to work with the development team to implement the changes. We give them the raw data, but they still need to work with our design team to figure out the best way to visualize that data in a useful way. We want our drivers to head to where the trips are happening, but we need to make sure they all don’t go to the same spot at once!
1pm – Lunch! Time to get out of the office for a bit.
2pm – Our weekly meeting with the operations team. We work closely with our ops team to make sure they get the data they need. Current project: Testing reward programs. They’re trying to figure out the best way to increase supply when we need more cars on the road. We’ve been analyzing the test results and working with them on reward optimization.
3:30pm – Back to the data. We have a lot of IDP drivers, and we want to make sure we’re fully utilizing their time. We’re currently looking at driver utilization – how much of their Veyo time do they spend with an empty car? How can we keep drivers busy when they’re in Available mode? By analyzing trip requests, driver schedules, and driver locations, we can work to better utilize our fleet.
5:00pm – We have a few more models we’re looking to run tonight. It’s time to run through the data we’re looking to analyze, make sure it’s complete, and set up some experiments.
6:00pm – Time to head home. We start again tomorrow!
Want to learn more about data science? Here are a few of our favorite blogs*:
- http://www.kdnuggets.com/
- https://www.oreilly.com/
- http://www.r-bloggers.com/
- http://horicky.blogspot.com/
- http://flowingdata.com/
- https://mathbabe.org/
- http://datausa.io/
(*obviously the views expressed in these blogs don’t reflect our own or those of Veyo..)