SPiCe: Sequence PredIction ChallengE

Welcome to the SPiCe webpage!

The Sequence PredictIction ChallengE (SPiCe) is an on-line competition about guessing the next element in a sequence of symbols that took place in 2016. Training datasets consist of whole sequences and the aim is to learn a model that allows the ranking of potential next symbols for a given prefix, that is, the most likely options for a single next symbol.

The evaluation process was interactive: you submitted an answer and then your were fed with another prefix. Once you had seen and given a ranking for all prefixes the score of your submission was computed.

The competition uses real-world data from different fields (Natural Language Processing, Biology, Signal Processing, Software Verification, etc.) and synthetic data especially created for SPiCe. We did not reveal which data is artificial and which is the result of preprocessing real data before the end of the competition. All data are now available, together with a script and the needed files to compute the score of a given model in a non-iterative way. Details on the data (and the results) are given in this paper (you can also have a look at these slides). Please, cite this peer-reviewed paper if you are using the SPiCe data.

All participants were encouraged to submit a short paper to the International Conference in Grammatical Inference (ICGI), that was held in Delft, The Netherlands, from 5th to 7th October 2016. The published papers can be found in the results page.

News

October 15th: We had an amazing workshop at ICGI'16 about the competition: thanks a lot to the participants who came to present their work and to discuss! Their papers and their slides are avalaible on the results page.

August 1st: The competition is over! Thanks a lot to all participants and congratulations to team 'shib' who wins the SPiCe competition! Details about final results can be found in the dedicated page.

July 1st: The last 4 problems of the competition are available. The 15 problems that make the competition are now all downloadable in the data page.

June 15th: Problems 10 and 11 of the competition are on-line.

June 6th: Problem 9 of the competition is available.

June 3rd: Problem 8 of the competition is on-line.

May 23rd: Problems 6 and 7 of the competition are on-line!

April 29th: New version of the Spectral Baseline is available : the memory management has been optimized

April 18th: Problems 4 and 5 of the competition are on-line!

April 14th: We have a new sponsor: Xerox Research Center Europe. We are thus happy to say that there will be a 500€ prize for the winner!

April 12th: The 3 first problems of the competition are on-line. We are very happy to declare the competition officially open!

April 2nd: a Spectral Learning baseline is avalaible and it is awsome!

March 28th: a 3gram baseline is available.

March 23rd: a python API to submit results is available on the participate page.

March 22nd: the call for participation is ready! Feel free to forward it to anyone that may be interested.

March 17th: first problem is on-line. Problem 0 is a toy problem that would not be taken into account in the final results but can be used to tune your script now that the website is fully operational.

Why this competition?

Being able to guess the next element of a sequence is an important question in many fields, from natural language processing (NLP) to biology (DNA modelisation for instance), including software verification, sensor information, sport scooting, and many others.

Different approaches can be used to handle the problem, from multi-class learning algorithms to automata (grammar) induction ones, including deep learning methods. We thought it would be great to compare the different methods on the same problems.

How does it work?

Each problem instance comes with one learning sample (containing complete sequences on which one can learn a model) and two test sets (containing prefixes of sequences whose next symbol has to be guessed): a public and a private one. Only the first prefix of these test sets is given beforehand: you have to submit a ranking of the 5 most probable next symbols to be fed with the following prefix of the test set.

In the case of a public test set, once the ranking for the last prefix of the set has been provided, a feedback is given. You can then start to submit again the first prefix of the test set, for instance if you are testing an algorithm that requires fine tuning of hyperparameters. The number of submissions for a problem on the public test set is not limited. It is the best score of a team on a given public test set that is used for the leaderboard.

For a private test set, you only have one shot: once you have submitted recursively all your rankings for the test set, you cannot submit anything else for this problem. This aims at reducing possible collusion: each participant can tune and compare his/her algorithms on the public test set, possibly using the prefixes in the set to help the learning, but this is not possible on the private test set: once you know all these prefixes, the process on this problem is finished. The final results of the competition will only use the scores on the private test sets (no feedback is given about these scores before the end of the competition).

You will find more informations on the description and participate pages.

Contact us

If you have any question or comment regarding the competition, feel free to contact us.