Game-Benchmark for Evolutionary Algorithms

On this site, we provide some details on the constructed benchmark. Everything is work in progress. If you have questions, please don't hesitate to ask.

The benchmark was integrated into the COCO framework in order to make it convenient to run and analyse. You can find it on Github here. For more details on the COCO framework, including how to run your own algorithms and post-processing, please check out the separate documentation. The complete benchmark (as of July 2019) is also described in our recent paper at GECCO 2019 called "Single- and Multi-Objective Game-Benchmark for Evolutionary Algorithms". Link coming soon.

The Top Trumps problem suites are based on Demonstrating the Feasability of Automatic Game Balancing, where the optimisation task is to fine-tune the values on the cards of a TopTrumps deck. This is done based on fitness functions taken from the paper, but the game was re-implemented in C++ to allow for an easier integration in COCO.

Below is an illustrative video of the game. For a detailed description, please refer to the paper. You can also find a more detailed and formalised description of the fitness functions there.

- Shuffle deck and distribute evenly among players
- Starting player chooses characteristic (category)
- All players compare corresponding values on their cards
- Player with highest value wins trick
- Until all cards have been played exactly once
- Winning player announces new characteristic, goto 3

We provide two function suites related to Top Trumps. The first is rw-top-trumps and comprised of 5 single-objective functions. The second one is rw-top-trumps-biobj, containing 3 bi-objective functions.

Any solution of the problem is an array that specifies all values of all cards in a deck. We specify that each deck has 4 categories. The search space dimension is thus a multiple of 4 and only depends on the number of cards specified to be in the deck.

The instances represent different potential themes of a deck. We express that by allowing different value ranges for each category, depending on the instance. The exact boundaries are generated using the instance number as a random seed.

objectives | 1 |

dimensions | (88, 128, 168, 208) = 4* (22, 32, 42, 52) |

functions | 5 |

instances | 15 |

Overview and characterisation of functions in rw-top-trumps. Column [min] indicates how the fitness measure x is transformed into a minimisation problem.

fid | Fitness Measure (x) | min | |
---|---|---|---|

t_1 | deck hypervolume | -x | |

t_2 | standard deviation of category averages | -x | |

t_3 | winrate of better player | -x | |

t_4 | switches of trick winner | -x | |

t_5 | trick difference at the end of the game | x |

objectives | 2 |

dimensions | (88, 128, 168, 208) = 4* (22, 32, 42, 52) |

functions | 3 |

instances | 15 |

Construction of the bi-objective functions from the rw-top-trumps-biobj suite (T_i) as pairs of single-objective functions from the rw-top-trumps suite (t_j).

T_1 = (t_1, t_2) | T_2 = (t_3, t_4) | T_3 = (t_3, t_5) |

Below is an illustrative video of the generation method. For a detailed description, please refer to the paper. You can also find a more detailed and formalised description of the fitness functions there.

objectives | 1 |

dimensions | 10, 20, 30, 40 |

functions | 84 |

instances | 7 |

Overview of functions in rw-gan-mario. Function ids in the benchmark are indicated in the last four columns, divided by overworld [o], underground [u], overworld concatenated [oc] and underground concatenated [uc]. Column [min] indicates how the fitness measure x is transformed into a minimisation problem.

Fitness Measure (x) | AI | min | o | u | oc | uc | |
---|---|---|---|---|---|---|---|

enemyDistribution | - | -x | 1 | 2 | |||

positionDistribution | - | -x | 3 | 4 | |||

decorationFrequency | - | 1-x | 5 | 6 | |||

negativeSpace | - | 1-x | 7 | 8 | |||

leniency | - | x | 9 | 10 | |||

basicFitness | A* | x | 11 | 12 | 13 | 14 | |

basicFitness | Scared | x | 15 | 16 | |||

airTime | A* | 1-x | 17 | 18 | 19 | 20 | |

airTime | Scared | 1-x | 21 | 22 | |||

timeTaken | A* | 1-x | 23 | 24 | 25 | 26 | |

timeTaken | Scared | 1-x | 27 | 28 |

Below is a more detailed explanation of each of the fitness measures:

- enemyDistribution: standard dev. (std.) of enemy tiles (x-axis)
- positionDistribution: std. of tiles you can stand on (y-axis)
- decorationFrequency: percentage of pretty tiles (Tube, Enemy, Destructible Block, Question Block, or Bullet Bill)
- negativeSpace: percentage of tiles you can stand on
- leniency: weighted sum of subjective leniency of tiles
- basicFitness: MarioAI championship score for AI
- airTime: ratio between ticks in air vs. total ticks (if level completed, otherwise 1)
- timeTaken: ratio between time taken and total time allowed (if level completed, otherwise 1)

objectives | 2 |

dimensions | 10, 20, 30, 40 |

functions | 10 |

instances | 7 |

Construction of the bi-objective functions from the rw-gan-mario-biobj suite (M_i) as pairs of single-objective functions from the rw-gan-mario suite (m_j).

M_1 = (m_4, m_6) | M_6 = (m_12, m_24) |

M_2 = (m_4, m_8) | M_7 = (m_13, m_19) |

M_3 = (m_11, m_17) | M_8 = (m_13, m_25) |

M_4 = (m_11, m_23) | M_9 = (m_14, m_20) |

M_5 = (m_12,m_18) | M_10 = (m_14,m_26) |

We will be using the COCO framework to produce benchmarking results based on your submitted problem. To do that, we need the following information from you:

- possible search space dimensions
- possible objective space dimensions
- number of instances

There are three ways you can submit your problem to us which are detailed in the following in order of preference.

If you have your problem running on a server, we just need an interface to your evaluation function. In order to keep calls consistent between all submissions, please adhere to the following specifications and formats. We want to be able to send our request with the following parameters to the address and port you provide.

`n d i x_1 x_2 ... x_n`

, where
- n is the search space dimension
- d is the objective space dimension
- i is the instance identifier
- x_1 to x_n are the input values

You should then return the objective value(s) y_1 to y_d as

`y_1 y_2 ... y_d`

.
If you prefer to have us run the code locally, we need an interface to your evaluation function. To make things easier, please adhere to the following specifications and formats. We want to be able to call your code from the command line with the following parameters

`n d i x_1 x_2 ... x_n`

, where
- n is the search space dimension
- d is the objective space dimension
- i is the instance identifier
- x_1 to x_n are the input values

You should then return the objective value(s) y_1 to y_d to stdout as

`y_1 y_2 ... y_d`

.
Please include a readme file with

- details on the setup and installation process
- an example of calling your problem from the command line

The code will be running on an Ubuntu machine, so please either make them platform independent or executable on Ubuntu.

Finally, if none of the above methods seems feasible to you, you can also send us some output values as raw data. For this option, please contact one of the organisers. We will then provide you with a textfile containing input vectors for each possible combination of search space dimension n, objective space dimension d and instance as

`x_1 x_2 ... x_n`

separated by linebreaks.
For each input vector in each textfile you should then compute the objective value(s) and send them to us in the same format, i.e. textfiles that include output vectors as

`x_1 x_2 ... x_n : y_1 y_2 ... y_d`

separated by linebreaks.
If you are interested in this option, please consider that we would like to have your response with the data by the problem submission deadline on June 30th, 2018. Therefore, please contact us early enough so that you are able to compute the output values in time. We are looking for at least 50 samples per input dimension if possible.

In the following, we explain some key terms relevant to the GBEA.

As we are working to compile a benchmark, it is important that we are able to investigate key questions such as how an algorithm performs for different search spaces. We are thus looking to provide problems in 4 or more dimensions, in order to be able to test the scalability of a given algorithm.

In order to be able to obtain statistically sound results, we are looking to provide 7 or more instances to each problem. Instances have the same (or very similar) fitness landscape characteristics, but their optima are located in different areas of the search space. Their scaling might also differ. Instances help avoid obtaining outlier results due, for example, to initialising an algorithm near the global optimum. One way to obtain instances is through transformations such as rotation, but several other options are possible, especially in the context of games.

Targets are used in order to measure the progress of an optimisation algorithm towards the optimum. This means that, every time a target precision is reached, the number of function evaluations taken is recorded. Thus, if there are any meaningful *milestones*, the targets should reflect that. In addition, if the problem is very hard, there should be more targets in order to be able to record progress effectively.

Simple tool in order to obtain a first impression of the fitness landscape. Walk through the search space in a random direction and evaluate the problem at equidistantly spaced points. Plot the results.

Features used for exploratory landscape analysis (ELA), which can then be used to further characterise the landscape. The flacco R package provides an easy way to compute a large number of features suggested in literature.

Higher-level characteristics, such as the existence of plateaus, multi-modality, etc. See this paper (specifically Fig. 1) for more details on landscapes and their analysis.