- DATA SET
- SAMPLE DATA
- IMPORTANT DATES
- WHO ARE WE?
On-demand transport services such as taxi, uber-like riderships or car sharing are widely spread today. Large urban areas require flexible last mile transportation offer that can effectively complement the mass transit networks in place (e.g. subways). The recent massive growth of the urban population have posed unprecedented challenges to the sustainability of the major cities in a worldwide scale, ranging from security to environmental issues, among others. In the last decades, there was a large concern of increasing the transportation offer (namely, in terms of service frequency and/or taxi fleet size). However, such massive offer increase is not possible to maintain in a short-term future. Consequently, the need of efficient, green, convenient and direct on-demand transportation services is real as an urgent answer to the present demographic trend is now a concrete need.
The mobility intelligence of on-demand transport services closely related to the dispatching policies in place (e.g. how should I drive my vacant taxi? How should I relocate my carsharing fleet throughout the day?). The dispatching policy depends on the seasonalities of the passenger demand as well as on the traffic conditions of each particular area. Demand can be divided in quantity and type/fare – which implies to predict apriori the services origin and destination, respectively.
In this challenge, we propose you to build a predictive framework able to infer the service fare type. This model will generalize the behavior demand on both spatial and temporal dimensions to determine a categorical target variable w.r.t. each service fare.
- 04/10/2016 – Submissions are now open;
- 30/09/2016 – Training data available;
- 25/11/2016 – Test Data available;
- 11/06/2017 – Workshop paper submissions open;
- 01/06/2017 – Challenge Post-verification ended. Final results available! Congratulations to the winners!!!
- 10/07/2017 – Deadline for submissions for the workshop were extended till 31/07/2017.
Training Data set:
Initially, we provide an training data set describing trips/samples from three months (from 01/01/2015 to 31/03/2015) of the services performed by roughly 1k taxis running in the city of Thessaloniki, Greece (i.e. 80Mb of datastored in one single CSV file named data_train_competition-csv). These taxis operate through a taxi dispatch central, using mobile data terminals installed in the vehicles. Each data sample corresponds to one completed trip. It contains a total of 5 features, described as follows:
1) ID: (integer) It contains an unique identifier for each trip;
2) TAXI_ID: (integer) It contains an unique identifier for each vehicle;
3) TIMESTAMP: (float) Julian Time stamp (in seconds). It identifies the trip’s start;
4) STARTING_LATITUDE: (float) Latitude coordinate of the trip starting point (i.e. WGS84 format);
5) STARTING_LONGITUDE: (float) Longitude coordinate of the trip starting point (i.e. WGS84 format);
The target variable REVENUE_CLASS is categorical. It reflects the fare of a given trip and it can range by five possible values: 1/2/3/4/5 as Low/Normal/Medium/High/Very High.
Additionally, you can also use one external data source (http://www.timeanddate.com/) to get information about holidays and observances in Greece during 2015.
Testing Data set:
A data set similar to the training one is shared for testing purposes i.e data_test_n_competition.csv. However, this data set contain no information about the values of the target variable. Your job is to determine such values.
- The filename sample_script-r
contains a R script which is capable of generating a sample submission which uses C4.5 implementation of the R package C5.0 with the default hyperparameters as base learner.
- The function “evaluation_metric” uses the same evaluation criteria than the used in kaggle competition webpage. It expects to arrays of categories…the predicted and the actual one. It outputs the Quadratic Weighted Kappa for those.
- The function “baseline_submission” reads the training dataset from a file, prepares a barplot of the target distribution. Then, it randomly selects a percentage of the training dataset (perc_training, a function parameter) to train a C4.5 model in 70% of this data. The remaining 30% are used as test set to benchmark the model. It outputs the benchmark, reads the test set and uses the model to prepare a submission file, which is then output to a file in the working directory named sample_submission.
Submissions are scored based on the quadratic weighted kappa, which measures the agreement between two ratings. This metric typically varies from 0 (random agreement between raters) to 1 (complete agreement between raters). In the event that there is less agreement between the raters than expected by chance, the metric may go below 0. The quadratic weighted kappa is calculated between the scores assigned by the human rater and the predicted scores.
Results have 5 possible revenue class, 1,2,3,4,5. Each search record is characterized by a tuple (ea, eb), which corresponds to its scores by Rater A (Ground Truth) and Rater B (predicted). The quadratic weighted kappa is calculated as follows. First, an N x N histogram matrix O is constructed, such that corresponds to the number of search records that received a rating i by A and a rating j by B. An N-by-N matrix of weights, w, is calculated based on the difference between raters’ scores:
An N-by-N histogram matrix of expected ratings, E, is calculated, assuming that there is no correlation between rating scores. This is calculated as the outer product between each rater’s histogram vector of ratings, normalized such that E and O have the same sum. From these three matrices, the quadratic weighted kappa is calculated as:
Submissions are now open.
Submissions for this competition are performed through the Kaggle platform. Link given below.
30th September 2016 – Training data available
25th November 2016 – Test data available
1st June 2017 – Deadline to submit predictions
2nd June 2017 – Post-Challenge verification start
2nd June 2017 – Paper submissions open
8th July 2017 – Paper submission deadline
31st July 2017 – Paper submission deadline [FINAL EXTENSION]
31st July 2017 – Paper notifications
15th August 2017 – Paper notification [FINAL EXTENSION]
8th August 2017 – Camera-ready
25th August 2017 – Camera-ready [FINAL EXTENSION]
1st July 2017 – Post-Challenge Verification End. The final results are publicly available
5th-8th September 2017 – Competition Presentation Session @ EPIA 2017
- You are not involved in any part of the administration and execution of this contest and/or EPIA 2017;
- You are not an immediate family (parent, sibling, spouse, or child) or household member of a person involved in any part of the administration and execution of this contest and/or EPIA 2017.
- Note: If you choose to submit an entry, but are not qualified to enter the contest, this entry is voluntary, and any entry you submit is governed by the remainder of these contest rules; EPIA reserves the right to evaluate it for scientific purposes. If you are not qualified to submit a contest entry and still choose to submit one, under no circumstances will such entries qualify for sponsored prizes
- Entry Contents: The participants will submit prediction results on test data during the development phase. They receive feedback during this stage. Yet, only the best results will count for the evaluation.
- Workshop paper: To be part of the final ranking, the participants will be required to publish a detailed paper in the proceedings of the EPIA Discovery Challenge workshop.
- Code Submission: To be eligible to receive their prizes, the candidates for prizes are required to publicly release their code under a license of their choice, taken among popular OSI-approved licenses (http://opensource.org/licenses) and make their code accessible on-line for a period of not less than one year (e.g. personal website) following the end of the challenge (i.e. Post-Challenge Verifications End).
- Use of the data provided: All data provided by this challenge are freely available to the participants from the website of the challenge under license terms provided with the test data.
- Submission: The entries of the participants (predictions and code) will be submitted on-line via the Kaggle web platform. During the development period, the participants will receive immediate feed-back on their submissions on a “Public Leader-board”. Yet, this ranking will be subject to an evaluation after the end of the development stage. The “Public Leader-board” will display partial results to help the teams to assess and validate their progression on this challenge. Such partial results will be computed using just a sub-sample of the total test data (i.e. 50%).
- Original work, permissions: In addition, by submitting your entry into this contest you confirm that, to the best of your knowledge: (1) Your entry is your own original work; and (2) Your entry only includes material that you own, or that you have permission from the copyright / trademark owner to use.
Potential Use of Entry
Other than what is set forth below, we are not claiming any ownership rights to your entry. However, by submitting your entry, you:
- Are granting us an irrevocable, worldwide right and license, in exchange for your opportunity to participate in the contest and potential prize awards, for the duration of the protection of the copyrights to:
- Use, review, assess, test and otherwise analyze results submitted, or produced by your code, and other material submitted by you in connection with this contest and any future research, contests or products sponsored by Taxiway and contest in all media (now known or later developed);
- Feature your entry and all its content in connection with the promotion of this contest in all media (now known or later developed);
- Are ac-knowledgeable that this license does not extend to methods, algorithms; source code used to generate your entry, unless you are one of the winners and have publicly disclosed them.
- Understand and acknowledge that challenge organizers and other entrants may have developed or commissioned materials similar or identical to your submission and you waive any claims you may have resulting from any similarities to your entry;
- Understand that you will not receive any compensation or credit for use of your entry, other than what is described in these official rules.
Submission of Entries
- Follow the instructions on the competition’s website to submit entries.
- The participants will be registered as mutually exclusive teams. Each team may submit only one single final entry. We are not responsible for entries that we do not receive for any reason, or for entries that we receive but are not functioning properly.
- The participants are subject to accept the terms of Kaggle, including the rules recapitulated in this section:
- One account per participant – therefore you cannot submit from multiple accounts.
- No private sharing outside teams is permitted – It’s OK to share code if made available to all players on the forums.
- Public dissemination of entries – Kaggle and the competition host have the right to publicly disseminate any entries or models.
- Open licensing of winners – Winning solutions need to be made available under a popular OSI-approved license in order to be eligible for recognition and prize money.
- Winning solutions must be posted or linked to in the forums – Prizes will be awarded after the winners have posted their solutions to the competition forum. Winners must post or link to their solutions with seven days of the final competition deadline (i.e. Post-Challenge Verifications End).
- Team mergers – Team mergers are allowed and can be performed by the team leader. In order to merge, the combined team must have a total submission count less than or equal to the maximum allowed as of the merge date.
- Team limits – There is no maximum team size.
- Submission limits – You can submit a maximum of 2 entries per day. You can select up to 1 final submission for judging.
Prizes and Awards
Taxiway is the financial sponsor of this contest. There will be 250,00 euros in prizes awarded as incentive prizes to boost contest participation.
- If there is any change to data, schedule, instructions of participation, or these rules, the registered participants will be notified at the email they provided with the registration.
- If you are a potential winner, we will notify you by sending a message to the e-mail address listed on your final entry within seven days following the determination of winners. If the notification that we send is returned as undelivered, or you are otherwise unreachable for any reason, we may award the prize to an alternate winner.
- Winners who have entered the contest as a team will be responsible to share any prize among their members. The prize will be delivered to the registered team leader. If this person becomes unavailable for any reason, the prize will be delivered to be the authorized account holder of the e-mail address used to make the winning entry.
- If you are a potential winner, we may require you to sign a declaration of eligibility, use, indemnity and liability/publicity release and applicable tax forms. If you are a potential winner and are a minor in your place of residence, and we require that your parent or legal guardian will be designated as the winner, and we may require that they sign a declaration of eligibility, use, indemnity and liability/publicity release on your behalf. If you, (or your parent/legal guardian if applicable), do not sign and return these required forms within the time period listed on the winner notification message, we may disqualify you (or the designated parent/legal guardian) and select an alternate selected winner.
We will post changes in the rules or changes in the data as well as the names of confirmed winners (after contest decisions are made by the judges) online on the competition website. This list will remain posted for one year or will be made available upon request by sending email to us.
By participating on this contest, you agree on abiding by these rules. In case you do not, please do not participate on our contest.
If an unforeseen or unexpected event (including, but not limited to: someone cheating; a virus, bug, or catastrophic event corrupting data or the submission platform; someone discovering a flaw in the data or modalities of the challenge) that cannot be reasonably anticipated or controlled, (also referred to as force majeure) affects the fairness and / or integrity of this contest, we reserve the right to cancel, change or suspend this contest.
TAXIWAY is the sponsor of this contest. Currently, TAXIWAY has 2500 members and owns 1,200 vehicles operating in the city of Thessaloniki, Greece. It has the biggest taxi fleet in the Balkans region.
Top-20 Leader Board
Here you have the possibility of submitting a paper about your solution for this challenge to be presented in our workshop (which will be held together with EPIA 2017). We remind you that the eligibility for challenge prizes depends on the submission of a paper to participate in this workshop (as previously described in the challenge guidelines).
Submissions consist of a paper with a maximum of 12 pages in the Springer format. The papers should describe all the preprocessing and learning models adopted/developed to handle one or two of the proposed predictive challenges in detail. The obtained experimental results should be presented, along with a brief discussion. Each submission must be submitted online via the Easychair submission interface. Submissions can be updated at before the submission deadline. The only accepted format for submitted papers is PDF.
Each paper submission will be evaluated based on relevance, the significance of contribution, technical quality, obtained results and quality of presentation. All accepted submissions will be included in the local conference proceedings (EPIA 2017). Electronic versions of accepted submissions will also be made publicly available through the conference website. At least one author of each accepted full paper or extended abstract is required to attend the workshop to present their contribution.
Afian, A., Odoni, A., & Rus, D. (2015, September). Inferring Unmet Demand from Taxi Probe Data. In 2015 IEEE 18th International Conference on Intelligent Transportation Systems (pp. 861-868). IEEE.
Chen, P. Y., Liu, J. W., & Chen, W. T. (2010, September). A fuel-saving and pollution-reducing dynamic taxi-sharing protocol in VANETs. In Vehicular Technology Conference Fall (VTC 2010-Fall), 2010 IEEE 72nd (pp. 1-5). IEEE.
Glaschenko, A., Ivaschenko, A., Rzevski, G., & Skobelev, P. (2009, May). Multi-agent real time scheduling system for taxi companies. In 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2009), Budapest, Hungary (pp. 29-36).
Lee, J., Park, G. L., Kim, H., Yang, Y. K., Kim, P., & Kim, S. W. (2007, May). A telematics service system based on the Linux cluster. In International Conference on Computational Science (pp. 660-667). Springer Berlin Heidelberg.
Lee, J., Shin, I., & Park, G. L. (2008, September). Analysis of the passenger pick-up pattern for taxi location recommendation. In Networked Computing and Advanced Information Management, 2008. NCM’08. Fourth International Conference on (Vol. 1, pp. 199-204). IEEE.
Li, B., Zhang, D., Sun, L., Chen, C., Li, S., Qi, G., & Yang, Q. (2011, March). Hunting or waiting? Discovering passenger-finding strategies from a large-scale real-world taxi dataset. In Pervasive Computing and Communications Workshops (PERCOM Workshops), 2011 IEEE International Conference on(pp. 63-68). IEEE.
Liu, L., Andris, C., Biderman, A., & Ratti, C. (2009). Uncovering taxi driver’s mobility intelligence through his trace. IEEE Pervasive Computing, 160, 1-17.
Liu, L., Andris, C., & Ratti, C. (2010). Uncovering cabdrivers’ behavior patterns from their digital traces. Computers, Environment and Urban Systems, 34(6), 541-548.
Moreira-Matias, L., Gama, J., Ferreira, M., Mendes-Moreira, J., & Damas, L. (2013). Predicting taxi–passenger demand using streaming data. IEEE Transactions on Intelligent Transportation Systems, 14(3), 1393-1402.
Moreira-Matias, L., Gama, J., Ferreira, M., Mendes-Moreira, J., & Damas, L. (2013, September). On predicting the taxi-passenger demand: A real-time approach. In Portuguese Conference on Artificial Intelligence (pp. 54-65). Springer Berlin Heidelberg.
Moreira-Matias, L., Ferreira, M., Gama, J., & Damas, L. (2014, October). An online learning framework for predicting the taxi stand’s profitability. In 17th International IEEE Conference on Intelligent Transportation Systems (ITSC)(pp. 2009-2014). IEEE.
Moreira-Matias, L., Gama, J., Ferreira, M., Mendes-Moreira, J., & Damas, L. (2016). Time-evolving OD matrix estimation using high-speed GPS data streams. Expert Systems with Applications, 44, 275-288.
Wan, X., Wang, J., Du, Y., & Zhong, Y. (2015, May). DBH-CLUS: A hierarchal clustering method to identify pick-up/drop-off hotspots. In Cluster, Cloud and Grid Computing (CCGrid), 2015 15th IEEE/ACM International Symposium on(pp. 890-897). IEEE.
Yao, C. Z., & Lin, J. N. (2016). A study of human mobility behavior dynamics: A perspective of a single vehicle with taxi. Transportation Research Part A: Policy and Practice, 87, 51-58.
Yuan, J., Zheng, Y., Zhang, C., Xie, W., Xie, X., Sun, G., & Huang, Y. (2010, November). T-drive: driving directions based on taxi trajectories. InProceedings of the 18th SIGSPATIAL International conference on advances in geographic information systems (pp. 99-108). ACM.
Zhang, D., He, T., Lin, S., Munir, S., & Stankovic, J. A. (2014, June). Dmodel: online taxicab demand model from big sensor data in a roving sensor network. In 2014 IEEE International Congress on Big Data (pp. 152-159). IEEE.
Zheng, Z., Rasouli, S., & Timmermans, H. (2015). Two‐regime Pattern in Human Mobility: Evidence from GPS Taxi Trajectory Data. Geographical Analysis.
Zong, F., Sun, X., Zhang, H., Zhu, X., & Qi, W. (2015). Understanding Taxi Drivers’ Multi-day Cruising Patterns. PROMET-Traffic&Transportation, 27(6), 467-476.
i. Challenge Chair –Luis Moreira Matias <email@example.com>, NEC Laboratories Europe (Heidelberg, Germany)
ii. Steering Committee – Josep Maria Salanova <firstname.lastname@example.org>, CERTH HIT (Thessaloniki, Greece)
————————————-Goncalo Homem de Almeida <G.Correia@tudelft.nl>, TU Delft (Delft, Netherlands)
iii. Program Committee – Marco Veloso <email@example.com> , U. Coimbra (Coimbra, Portugal)
Jihed Khiary, NEC Labs Europe (Heidelberg, Germany)
Josif Grabocka, U. Hildesheim (Hildesheim, Germany)
Erik Jenelius <firstname.lastname@example.org>, KTH Royal Institute of Technology (Sweden)
iv. Technical Support – Amal Saadallah <Amal.Saadallah@neclab.eu>, NEC Laboratories Europe (Heidelberg, Germany)
v. Local Organizing Committee –Shazia Tabassum <email@example.com>, LIAAD, INESCTEC, University of Porto (Porto, Portugal)