PASCAL - Pattern 
Analysis, Statistical Modelling and Computational Learning

Exploration vs. Exploitation Challenge

19 January - 15 November 2006.

The Challenge

The challenge aims to stimulate research into the optimal way to trade­off exploration vs exploitation where the response rate for each option is varying over time. Instead of a static data set, data for the challenge is provided in the form of a visitor behaviour simulation engine, written in java, and also made available as a Matlab function.

Entrants should write a program which will repeatedly choose from a range of possible display options, and query the visitor simulation engine through an interface e.g.

public Boolean selectOption(int option)

The engine will return true or false, depending on whether the simulated visitor responded to the option that was served to them. The objective is to maximize the number of positive responses returned by the engine. The response rate for each option varies over time. The visitor simulation engine maintains time ­dependent continuous functions for the response rate of each of the options. When the engine is queried with a given option, it calculates the value of the function at that time. The engine returns true with a probability given by this value. The time variable used is the number of requests made to the engine, rather than real time. This is to remove dependencies on the machine hardware, and application performance of the entrants. In this application there is no fixed horizon, that is to say the number of trials is not fixed, and so solutions should not use the number of trials as a parameter.

To prevent the engine from repeating the same behaviour every time it is restarted, the response rate functions are parameterized. On restart, new values for these parameters are randomly generated based on a seed.

The challenge is broken into a number of individual tasks each relating to a particular problem that is frequently encountered in real website data. The engine addresses each of these problems in turn.

Tasks

There are now six tasks, the old tasks are now invalid. In each task there are 5 options. The best achievable response rate is approximately 0.18 for all tasks. All tasks are designed to display the appropriate behaviour over 1M requests. (Although during the judging phase more may be required.)

Task 1. SimulatedVisitorFrequentSwap

To check a candidate algorithm can respond to relatively frequent changes of the best option.

Task 2. SimulatedVisitorLongGaussians

To check a candidate algorithm can respond to changes of the best option over longer time periods.

Task 3. SimulatedVisitorWeeklyVariation

To check a candidate algorithm can handle coherent variation of response rates where there are two sinusoidal components to the variation of differing periods, the longer period being dominant. In addition, there is a gradual change in the relative ranking of the option. This is to ensure there is ongoing exploration throughout the run and that it can handle the consequent comparison of option response rates that have been served at different times. In addition, each option has a randomly chosen start time.

Task 4. SimulatedVisitorDailyVariation

To check a candidate algorithm can handle coherent variation of response rates where there are two sinusoidal components to the variation of differing periods, the shorter period being dominant.

Task 5. SimulatedVisitorWeeklyCloseVariation

To check a candidate algorithm can handle coherent variation where response rates are closely spaced.

Task 6. SimulatedVisitorConstant

We must ensure a candidate algorithm can handle the ideal case. An over exploratory algorithm can degrade to random performance under these conditions.

Submission of Results

Results can be submitted at any time via the submit tab, a set of random seeds is available here. Run your code using these seeds with each run of length 1M requests.

Judging

During the judging phase of the challenge, a further evaluation will take place for each algorithm. This is to ensure candidate algorithms can perform well over a variety of conditions. A second set of random seeds will be published along with a different run length. Each entrant will be re-evaluated against these seeds and run length. The winner of the challenge will be the entrant who obtains the lowest total regret on all six tasks. There is no longer the option of submitting for one task only.

Touch Clarity reserves the right to disqualify entrants if any of the following rules are broken: