We all suffer from waiting in the lines of coffee shops, banks, DMVs and many more such places. I can easily enumerate hundreds of such places. Even in the advent of technology in many areas of life, we still have to wait in the line of a coffee shop to get our simple order done. This could take a few minutes or if you're in a popular place, then it would even take more than 20-30 minutes. Think about how many times you entered a line and thought of leaving instead of waiting such a long line. Or you may even think to switch the line to get to the cashier faster if the place have more than one line. The thing is; you never know unless you visited the place. Below showcases the dilemma we face everyday.
Ok, maybe it is more of a luck then estimation on this particular case and it might be very difficult to estimate this particular one too. But most of the lines aren't that unpredictable, and most of them exhibit pattern that repeats itself time to time, day to day and season to season. The point is; isn't it awesome to know how many minutes you'll wait in the line of your favorite coffee shop in a particular time. Moreover, if you just look that information up from your mobile phone, then it would even be better.
We recently published a paper called: "LineKing: Crowdsourced Line Wait-Time Estimation using Smartphones" in Mobicase, 2012. We developed a smartphone application that can detect the time spent at Tim Hortons in the Student Union of University at Buffalo. We utilize domain specific, i.e. for both Android and iOS, localization techniques to understand the time spent at this particular place. This includes both location-sensing and WiFi-sensing. Once we detect a wait-time, we immediately send this information to our server deployed at AWS EC2.
We had many challenges during the initial phase. At first, we got a very few readings as we hadn't had too much user-base. But once the popularity of the application increased along with a viral advertising campaign, we quickly reached 1000s of users that provides 100s of readings daily. At first this was amazing as we thought we're very close to the solution that gives users real-time estimation of the current status of the lines.
We quickly noticed that "having lots of data is good only if you can interprete them". So, we started to analyze the data to have an understanding of how line wait-time evolves and later developed an approach to estimate future wait-times. First, we built a heuristic nearest neighbor estimation (NNE) algorithm. The idea is; for a given query, lets say January 1st, Monday 11am, find the 5 similar previous point in the history that has the highest similarity to the given query. To implement this, we represent every data with 3 dimensions namely; "week of the year", "day of the week" and finally "interval of the day"; [w,d,i]. The problem here is to define a similarity metric between vectors. To do this, we assume that wait-time (v) is linearly dependent to each dimension of the data vector; [w,d,i] as v = a*w + b*d + c*i. Using the previous history of wait-times, and using linear regression, we estimate the optimal a, b and c parameters and use these parameters to calculate the dissimilarity between two vectors; d(v1,v2) = a*(w1-w2) + b*(d1-d2) + c*(i1-i2). So the problem is to find the 5 most similar ones from the history of wait-times for the given query. Then we take the avg. of them as the forecasted value. This technique (constrained nearest neighbor search in a multi-dimensional space) works well and provides MAE of ~3 minutes. Since the wait-time changes between 2 to 20 minutes in this particular coffee shop, it wasn't a bad estimate.
Second, we thought of building a model, specifically a time series one, to improve the estimation, however we quickly realized that our data neither complete nor uniform. Although, we had 100s of readings daily, it was not enough to fill all the 10 minutes intervals that we want to estimate. In reality, general time series analysis depends on data that is uniformly distributed along the time. So if we wanted to use a model then we had to construct a uniform and equally spaced data before building any kind of model. This was basically a missing data problem: for the intervals that don't have any readings we need to fill them out in some way. One way to deal with missing data is to use regression and our NNE technique is a perfect fit to this problem. Hence, we use our previous approach (NNE) and fill the missing data. Now, we have a complete and uniform data, so we can build a model from it. We consider ad hoc exponential smoothing and a more complex Holt Winters as models. Our results indicate that we improved our previous results around 1 minutes and reduced MAE to ~2 minutes. Our results also indicate that exponential smoothing and Holt Winters exhibits similar errors where Holt Winters beats exponential smoothing in small margins. Since our experiments was only for 8 weeks of data, we believe that margin will increase as we have more data that shows more seasonality and trends.
This is basically the story of our paper, dubbed as LineKing (LK). Applications can be found on the following link. Our future plan is to make localization more robust while scaling LK to other coffee shops and franchises.
No comments:
Post a Comment