The Instruction Booklet for "Around the world in 80 Puzzles" is now available.
World Puzzle Championship 2013
Saturday 28 September 2013
Monday 26 August 2013
Around the world in 80 puzzles - Scoring
Following up the introduction of "Around the world in 80 puzzles", and discussing details about its format, this is an update on the scoring system that will apply for the puzzle sets of this part of the Championship. Please read carefully.
Scoring
The reason we need a separate scoring system for “80 puzzles” is that it needs to compensate for the fact that different competitors are not solving the same set of puzzles. Although we had taken significant measures that the sets are compliant to certain standards and that they are at a similar level of difficulty, it is impossible to guarantee that there will be no imbalance between the sets, therefore a scoring system needs to be made aware of this. The process is as follows.
Puzzle sets will contain raw puzzle scores (“raw scores”) as if they were ordinary WPC rounds, these will be established based on the results of the test solving that the core team performed. All sets will have an identical maximum score. Since every competitor solves exactly three sets, everyone will have three raw scores. The raw scores will be converted to standard Championship scores as follows:
- The 5th placed official competitor’s raw score will be converted to 500 points
- The median of all the official raw scores will be converted to 250 points
- A zero raw score will be converted to 0 points
- Raw scores, other than those in a)-c), both official and unofficial, will be converted to points proportionally, i.e.
- Raw scores below median will be converted using linear scaling to the zero and median marks
- Raw scores above median will be converted using linear scaling to the median and 5th marks
This is actually much easier than it sounds ☺
Let’s see an example, assuming a 60 minutes round with a total of 100 points available, with 25 competitors, assuming all of them are official, showing ranking, name, raw scores and converted points:
Rk Name Raw Conv
=====================
1 Alice 98 663
2 Bob 96 638
3 Cecil 91 575
4 Daniel 87 525
5 Edward 85 500
6 Frank 81 475
…
12 Luke 47 263
13 Median 45 250
14 Norah 43 239
…
24 Yves 2 11
25 Zero 0 0
(Note that it is not necessary to have someone with zero points - it is of course not desirable ☺)
As seen in the table, the 5th placed solver received 500 points, the one with the median score (called Median, incidentally the right place in the alphabet) received 250 points, while unlucky Zero (incidentally, again) got, well, 0 points. All the other solvers’ scores were converted into points proportionally scaled to how well they did compared to those three highlighted solvers (more precisely, proportionally scaled to the two of them that are closest to them).
Rationale
Although we have taken as much caution as possible to ensure that the difficulty of the sets is at least similar, only three test solvers’ data is no guarantee that they actually are similar. Since different competitors will attempt different sets, any difference in actual difficulty needs to be compensated for, and that’s exactly what the outlined calculation does.
No matter how hard each of the puzzle sets will turn out to be, the above compensation scheme will ensure that any performance of solvers will be rewarded in light of how well they did compared to the rest of the solvers who solved the same round (in theory, the fact that not all rounds are solved by the same set of solvers can make some difference here, but with a high number of solvers, the variance in the eventual 5th score and median is negligibly small). In case a puzzle set turns out to be harder than the others, then the raw scores of the field will probably be lower overall, meaning that the final scores will be adjusted a little more upwards than for another puzzle set. Conversely, if a puzzle set turns out to be easier than the others, then the raw scores are likely to end up higher and they will be adjusted a little less upwards than for another puzzle set.
The choice of the markers (i.e. 5th place and median) may seem rather arbitrary. Indeed, the number of markers to use and where they are located in the distributions is up for choice. We have examined a few options here, including already existing score conversion algorithms, and found that while using a single marker (i.e. making all scores proportional to just one median or just one average) may introduce some unwanted distortions, but using two markers work reasonably well across the distribution.
Indeed, taking a look around popular online puzzling sites, the most notable examples are LMI and Croco-Puzzle, both of them uses ratings to rank puzzlers (although LMI is based on complete puzzle sets, while Croco goes by one-puzzle times), and both of them uses two markers to scale the scores of the field, in their case it is the top competitor and the one with median raw scores. Our method is indeed fairly similar to theirs, the only difference is that we believe that the top performance of any field may have a high variance in extraordinariness, while taking a score slightly below the top is more stable in general, hence our choice to use the 5th raw score instead of the 1st. The reason for this choice of ours is explained below.
(The fact that we had been looking at LMI and Croco ratings, among others, makes sense especially because the purpose that these puzzle sites have introduced ratings is precisely to be able to measure and rate solvers’ performance across multiple tests, even if different solvers choose to solve different sets. While the choice of puzzle sets is of course a little more controlled in case of “80 puzzles” than in the free world of online competitions, using the same principle to bring solvers’ results into a comparable shape is certainly justified.)
The choice of 250 and 500 may seem arbitrary, but there certainly is a thought process behind them, both in terms of the magnitude and the ratio of these numbers. All the individual rounds of the Championship that are outside the scope of “Around the world in 80 puzzles” will be scored based on the philosophy that round lengths and total scores will approximately come down to a “winner to achieve 10 points per minute” scheme, although some of the rounds may be designed to be more realistic for top puzzlers to finish and obtain bonus points than some of the other rounds. Therefore, if we want to keep the “80 puzzles” rounds’ scoring in line with that overall philosophy, we need to try to make round winners to score “around 600”. Easier said than done – granted, we could just award them with 600 points with a different score conversion, but later in this write-up we make clear why we chose to set 5th placed solvers scores instead of that of round winners. Given some of the test data to be detailled below, the winners are expected to score around 600 based on this score conversion scheme, maybe slightly above if they beat the field comprehensively, so the choice of awarding 500 points to 5th placed solver seems a sensible approach (other than that, it also sounds simple and 500 is a well-rounded guideline number). On the other hand, a large set of test data (including but not limited to the WPC rounds analysed below) has indicated that the median score of a solver field typically falls within 45-50% of the score of the 5th placed solver, and that while this ratio is more typically around 45% for some of the online test that we have looked at, for WPC rounds it tends to be more around 50%, as indeed is almost exactly the case in some of the test data below. A possible explanation for that is that solvers who attend a WPC represent a slightly higher skilled field than those who attend online tests (even though the latter ones often feature many of the top guns, too!). Therefore, the use of the 50% value seems justified for median raw scores to be converted onto, hence the introduction of 250. Additionally, while the exact mapping of the lowest scores could also be dealt with in various ways, we have gone for simplicity here and just say that a raw score of zero maps onto zero points.
Finally, the reason for only using official competitors’ data for the purpose of determining the markers for score conversion is obviously that none of the unofficial scores should have any impact on any of the official ones.
In trying to provide some further analysis and some visualised results to understand the impact of the score conversions, he following paragraphs will turn a little technical, feel free to skip if not interested in the dirty details!
I have looked into some past data to see if applying the method above would uncover anything odd. While I have certainly not collected and processed enough data to be able to say that we conducted a large amount of testing, this set of data was intentionally chosen to at least resemble to the “80 puzzles” framework to give us some confidence. In addition, there were other tests done by one of the reviewers using other sets of data that don't fit the boundaries of this page but available for further discussion.
Description of test data used:
- Taken from the results of WPC 2011
- Includes the full results of four long individual rounds of practically assorted puzzles:
- Round 2 – “Assorted”
- Round 5 – “Evergreens”
- Round 12 – “Hungaricum”
- Round 13 – “Innovatives”
- These rounds featured a range of design philosophies (classic-ish, innovative, and of course the infamous Evergreens with all those think-outside-the-box puzzles)
- These rounds lasted between 50 and 70 minutes (averaging to 60)
- Since WPC 2011 took place in Hungary, these rounds were scored and timed by the exact same core team in 2011 that coordinated the puzzle creation process with the authors of “80 puzzles” this year
For all these reasons, the choice of this particular set of data seems justified by the fact that “80 puzzles” will be very similar in all the aspects.
The figure below provides a nice visualisation of all this data in one glance. One curve corresponds to one of the rounds as shown. The horizontal axis captures the ranking of competitors for each of the rounds, while the vertical axis shows the scores they achieved.
Score/rank distribution of four selected long rounds of WPC 2011. Note a fairly similar shape across all of them |
It’s hard to say what is a “good” distribution for any of the rounds, but looking at this figure, it is probably fair to say that the four rounds’ distributions are reasonably similar. (Of course, this doesn’t mean that everyone actually achieved similar scores in those rounds, quite the contrary – but when you look at all the competitors as a group, their overall performance distribution seems to be reasonably consistent.)
The good thing that comes out of this figure is that there seems to be a decent level of consistency on “what is the percentage of the maximum score you need to achieve if you want to finish in position N”. Of course, winning scores are always difficult to predict but as you go down the list, it becomes much more controlled, the lines are fairly close to each other.
It is important to point out that this type of consistency is not just a property of the puzzler population, it’s also a feature of the scoring/timing system in terms of determining the amount of puzzles that are packed into a round, the composition of difficulty levels, the scoring and the timing of the round. Let me show you how it can look much worse – this time we are using all but a few individual rounds from WPC 2011, and for visualisation purposes the data has been mapped onto a scale of 1000 (since the round sizes were largely different – you’ll notice someone must have got a small bonus in one of the rounds, and Sprint was excluded precisely because there were many bonuses there which is not relevant for us), otherwise the chart works just like the previous one:
Score/rank distribution of (almost) all individual rounds of WPC 2011. Some of them are notably different this time |
Here, the fact that Screentest, whose percentage scores were significantly higher than those of all the other rounds is probably not much of a concern, given that it was indeed a very different round in nature than all the others. However, the other culprit with those low scores, Borderless, is more of an issue: apparently, half the field solved only one puzzle or none at all, there were groups of tens of people finishing on identical scores (meaning that this round contributed nothing to separating those people based on their performance or to establish their rankings), and the overall percentage is far lower than all the other rounds. It’s probably fair to say that with hindsight, looking at this data and the context it provides, the Borderless round was not appropriately composed, scored and timed for a Championship (other sources of feedback indicate content issues also).
(It would be interesting to look into similar data from other WPCs. I’m pretty sure we would get significantly different results for some of the years.)
Let us get back to the scores of the four rounds of WPC 2011 and let us pretend that they are actually raw scores from an imaginary “80 puzzles” contest from somewhere in the past. We notice that although the curves are, as discussed, fairly parallel and not far off, they are not actually very close to each other. If you take the line that in a “80 puzzles” framework not everybody would have solved every round, then someone doing Evergreens (the highest scoring of the rounds) but skipping Innovative (the lowest scoring one) will have probably ended up with higher raw scores than someone with similar skills but skipping the sets the other way round, which is why score conversion is required in the first place.
Therefore, let us apply the scoring methods of this year’s “80 puzzles” onto those four rounds to compensate for this apparent difference between the difficulty of the rounds. Figures are rounded to integers (for scores) and three decimals (for multipliers) for the purposes of showing here (but not for the purpose of actual calculations).
WPC 2011 rounds normalised using the method defined for "80 puzzles". They line up so well except for the head of the crazy green curve... |
Some backing data, all given in the order these rounds took place in the actual event, i.e. Assorted, Evergreens, Hungaricum, Innovative.
Definition Ast Evg Hun Inn
================================================
Round winner (raw) scores: 630 645 870 520
5th place (raw) scores: 545 550 590 475
Median scores: 270 240 295 220
(Note that in this data set, there was no distinction between official and unofficial scores. These numbers might slightly, but not radically differ when not including unofficial scores into calculating the markers.)
It looks pretty clear that the four rounds’ scores have now been normalised into a fairly similar distribution, regardless of their comparative difficulty, which has been our main objective.
An interesting thing to note is the very high score obtained in the Hungaricum round, indeed the converted top score is 975! This is the result of the fact that the difference between the top ranked solver and the 5th ranked solver was much higher in this round than in any of the other rounds. You can argue that this 1st ranked solver is then rewarded better for his performance than in any other round winners, but then again, they did achieve an outstanding result in actual solving, as evidenced by even the raw scores and therefore they need to be properly rewarded by whatever score conversion.
In fact, this set of data offers an excellent visual justification for us not choosing to mark the curves using the 1st ranked solver in any of the rounds: had we chosen to do so, the normalised figures would look like this (this is a different method from the one we will be using for 80P):
The same normalisation concept when marked to the top competitor instead of the 5th placed one. Note how one rocket score in the green round impacts its 5-30th places |
What this demonstrates is that now the top ranked solver of the Hungaricum round did not get rewarded better than the winners of the other rounds. Instead of that, however, all the non-top solvers within the top 30 have now been significantly under-rated compared to similarly ranked solvers in all the other rounds.
In the context of “80 puzzles”, it is clearly more desirable to keep the field together even if the occasional stellar performance of a round winner may seem very highly rewarded, rather than keep the winners together and create a situation where tens of people may feel not sufficiently rewarded.
(While this choice makes sense for our purposes to ensure the balance of “80 puzzles” sets, it is important to note that the rating systems of LMI and Croco are well established and robust systems that address slightly different situations, therefore our analysis should not in any way be seen as an attempt to try to assess or question their processes – in fact we are grateful to them for having pioneered to implement such rating systems, and of course for all the work they put into running their sites for all of us in the first place!)
Conclusion
It is probably apparent from the description above (even from the amount of scrolling required) that finding the right system for ranking puzzlers over a diverse set of puzzles is a complex problem with many possible considerations regarding assumptions, methodology, data and parameters, and many subtle choices that influence how well any ranking system will do under competition circumstances. It's important to point out that analysing such systems is very easy a posteriori but nearly impossible a priori, meaning that it's important to see how well other similar examples have done and try to make the most of those working examples. There are no claims that this scoring system is perfect (it is impossible to make such a claim), but hopefully it is by now clear to the reader that this issue was not taken lightly, we have put a huge amount of due diligence into investigating solutions and alternatives and into communicating why certain choices were made the way they were made.
Therefore we are confident that with this scoring system and the format rules communicated earlier, "80 puzzles" will be seen as a balanced, fair and integral part of the Championship and will be successful as such!
Notes
This document was sent for review to a couple of people a few days before its public appearance. Feedbacks, suggestions and remarks were received and, where applicable, incorporated.
Reviewers are, in no particular order:
- Members of the core team (useful comments from Pal Madarassy)
- The lead authors (tips from Thomas Snyder)
- The WPF Board
- Special thanks to Tom Collyer for an in-depth review on methodology and the insightful and inspiring bits of feedback
The next communication about "Around the world in 80 puzzles" will be the release of the instruction booklets of the puzzle sets therein, prior to releasing the booklet of the whole event. This is designed to allow for teams familiarising themselves with the sets and prepare their line-ups for "who skips what" decisions.
Friday 23 August 2013
Around the world in 80 puzzles - Format update
As it was already stated in the introduction of "Around the World in 80 puzzles", every competitor will attempt to solve exactly three of the puzzle sets and will skip one. Here is a clarification on the related rules.
Eligibility
This section clarifies who is eligible to attempt each of the puzzle sets.
- Any competitor, official or unofficial, who served as a puzzle author, co-author, reviewer or tester (hereinafter commonly as: “Author”) for a puzzle set is ineligible to attempt the puzzle set they have contributed to. The list of the names of all these people has been listed in the original announcement.
- Any competitor, official or unofficial, who has an “Author” in their team, is ineligible to attempt the set of that “Author”.
- Any competitor, official or unofficial, who has an “Author” as their team captain or other personnel is ineligible to attempt the set of that “Author”.
- Any competitor, official or unofficial, who has an “Author” as their compatriot (assuming standard nationality rules) is ineligible to attempt the set of that “Author”.
- In case any competitor is rendered ineligible for at least two of the sets, a case by case review is required by the Organising Committee to determine if and how that competitor can still compete in three sets.
Rules
Competitors, who by the means above, are eligible to attempt only three of the puzzle sets, will complete those three sets and skip the one they are ineligible to attempt.
Teams whose members are eligible to attempt all four puzzle sets are line up themselves so that each member of the team skips a different puzzle set.
- Teams of four competitors will therefore line up themselves so that each of the puzzle set is attempted by exactly three of them, while each of the puzzle sets is skipped by exactly one of them.
- Teams of less than four competitors are still required to ensure that they all skip a different puzzle set.
- Teams are required to communicate their decisions to the Organising Committee prior to Round 1 of the Championship starts. The Organising Committee will not publish the decisions of any of the teams before that deadline, to avoid influencing other teams in their own decisions.
- The decision is made by the members of the team. Given the timing, teams are welcome to study the Instruction Booklet that will, as usual, be published ahead of the event and arrive at their decision knowing the exact details about the puzzle types of each of the puzzle sets.
Rationale
It is important to note about the eventual system described above that teams who are not affiliated with any of the “Authors” may actually be getting a slight tactical advantage by being given the option to determine which of them will attempt which of the sets, whereas the author teams do not get such a choice, they obviously need to work with the “other” three sets.
This will hopefully eliminate any concerns that authors or their teams are gaining any (real or perceived) advantage by the introduction of this “80 puzzles” system, because they clearly do not. Authors and (when already known) their teams have been directly contacted and communicated about the possibility that they may be at a slight disadvantage as a result.
It is also a nice property of the proposed system that authors aside, all their sets will be attempted by an equal, or almost equal number of competitors. One result of this is that the official team results will very likely include an equal number of attempts over all four puzzle sets (assuming that only four official teams will be “Author”-affiliated).
There were a few other ideas on the table which have been ultimately dismissed.
One idea was to have all competitors attempt all four sets who would otherwise be eligible to do so, and then have a way to transform those four scores into just three. This could have been achieved in a balanced way (e.g. to multiply all of them by 75%) or in a way that would explicitly favour these non-author teams (e.g. to pick the best 3 results out of the 4, thereby effectively providing them with a wildcard option “you can screw up one round for free”). One of the reasons this idea was dismissed, however, is that “Authors” and their teams/compatriots would then have an hour of resting time while everybody else is competing, and it seems extremely hard, if not impossible, to weigh or quantify whether any possible adjustment on the scores would fairly balance the advantage they may get just by having a more relaxed schedule. Another argument against this scheme is that according to our data evaluation, such a wildcard option was found to provide more benefit to less consistent solvers than to more consistent ones, and that's hardly a good incentive to introduce.
Another set of ideas were discussed around how to determine which competitor skips which of the sets (wherever there is a choice). It could have been decided in a completely centralised way, but then any such decision would have probably been seen as rather arbitrary and therefore questionable, so we did not go for it. Alternatively, we could have said that every competitor is completely free in picking their skip set – while that may have been a popular option at first glance, it has the potential to result in one of the sets being skipped by a significant majority of the competitors for whatever (probably mostly perception-driven) reasons, and that not only comes down badly on the author team of that set, it it also makes any scoring system more unstable.
Notes
This document was sent for review to a couple of people a few days before its public appearance. Feedbacks, suggestions and remarks were received and, where applicable, incorporated.
Reviewers are, in no particular order:
- Members of the core team (useful comments from Pal Madarassy)
- The lead authors (insightful feedback from all of them)
- The WPF Board (helpful comments from Will Shortz)
- Few other (non-author) competitors
A similar communication about scoring details can be expected within the next day or two.
Wednesday 21 August 2013
Around the world in 80 puzzles - Puzzle creation process
This is a follow-up post from "Around the world in 80 puzzles". There has been an enormous amount of feedback, both online and in more private channels, some excellent points were brought up and discussed, for which we are extremely thankful. Honest and transparent feedback addressed to the right channels is key to turning ideas into success.
Some of the key points that came out of the discussions that sparked up are related to questions about organisation details (e.g. how will it be decided which competitor skips which puzzle set) and how the scoring will exactly work (e.g. to compensate for any possible imbalance between different sets). These questions are in the process of being followed-up, right now the proposed solutions are being reviewed by quite a few people before getting published. So bear with us, this proposal should now be communicated publicly within a day or two.
In today's post I wanted to clarify the process of creating the puzzles for this "80 puzzles" concept, since a couple of comments/questions/concerns came up in the feedback of the initial post that the puzzle sets provided by author teams will be very different in style, or very different in difficulty, or that they will all contain the same types and be boring, or that they will all only contain abstract innovations that will not be very entertaining. These comments seemed to assume that the four authors are working in a completely independent and uncontrolled way and that there is no coordination that would ensure their product meets Championship standards (or if the authors are speaking to each other, then they know each other’s plan in advance, which is unfair). These assumptions are largely untrue and hopefully the description of our internal process helps everybody understand why.
All four author teams had apparently chosen to test their own puzzle sets thoroughly before submitting them to us. However, we wanted to obtain consolidated test results to try to estimate comparative difficulties of the sets to bring them as close as possible.
Our testing had at least three people from the core team solve each of the sets under competition circumstances. Test solvers were Pal and the three Zoltans, i.e. team HUN-A that finished 4th at last year's WPC. We recorded all our solving times for each of the puzzles and provided these results to the authors in a collated feedback (each author only received feedback about their own set, obviously).
There were instances where some or all of us found certain puzzles to be too difficult to the point of not being too much fun to solve. There were other instances where we felt the puzzle is not suitable for the circumstances. In these cases, we asked authors to adjust and/or replace those puzzles. (I need to point out that all the authors did a very mature and quality job, so it is the minority of the puzzles in any of the set that we had to ask to be tweaked or replaced.)
As a result of these iterations, we feel that the puzzle sets are now balanced on difficulty and all other measures as much as it is reasonable to expect. In particular, the solving times for each of the test solvers fall within a 10% range for each of the sets – this is surely not a lot of data points to go by, but then this is also how we had timed WPC rounds in 2005 and 2011 and 24HPC rounds in many other years and these events were (mostly) reasonably predictable, in how rounds were scored and timed.
In case the difficulty of the sets still turns out to show some differences, that's where the scoring recommendation, details of which are to be posted in a day or two as discussed above, will come into play.
To ensure a consistent graphic appearance of puzzles during the entire Championship, the core team has agreed to re-draw all the puzzles that the 80 puzzles’ authors create. We also review the instructions of the puzzles, for a similar reason.
Some of the key points that came out of the discussions that sparked up are related to questions about organisation details (e.g. how will it be decided which competitor skips which puzzle set) and how the scoring will exactly work (e.g. to compensate for any possible imbalance between different sets). These questions are in the process of being followed-up, right now the proposed solutions are being reviewed by quite a few people before getting published. So bear with us, this proposal should now be communicated publicly within a day or two.
In today's post I wanted to clarify the process of creating the puzzles for this "80 puzzles" concept, since a couple of comments/questions/concerns came up in the feedback of the initial post that the puzzle sets provided by author teams will be very different in style, or very different in difficulty, or that they will all contain the same types and be boring, or that they will all only contain abstract innovations that will not be very entertaining. These comments seemed to assume that the four authors are working in a completely independent and uncontrolled way and that there is no coordination that would ensure their product meets Championship standards (or if the authors are speaking to each other, then they know each other’s plan in advance, which is unfair). These assumptions are largely untrue and hopefully the description of our internal process helps everybody understand why.
Selection of authors
I had (me = Zoltan N here) a long list of potential contributors based on available history as a competitor and puzzle author. Since I was coordinating last year’s 24HPC as well where the majority of the puzzle authors were international, it was easy to ensure that all authors of 80P have already proven themselves as being able to contribute a single set of puzzles of high quality in the context of working with a core team remotely against well-defined requirements (details). While 24HPC is very different to and much smaller in scope than a WPC, a track record of smooth cooperation and a positive feedback on the actual puzzles from competitors is certainly a good indicator of further success in working with these authors again.
Guidelines
Once the authors confirmed that they are happy to participate in this programme, I provided them with a thorough guideline documentation about the expectations against the puzzle set that they had volunteered to provide. The guidelines were basically similar in nature to the 24H guidelines, although probably a little more restrictive. Requirements include
- Specifying the framework (60 minutes, 20 puzzles).
- Puzzle sets should offer a balanced set of puzzles in terms of difficulty, ranging from easy puzzles that should be accessible for any WPC competitor or even for the public, to difficult puzzles that challenge even serious title contenders.
- Puzzle sets should offer a balanced set of puzzles on the scale of novelty, ranging from very traditional and well-known types to new variations or innovations.
- Puzzle sets should include puzzles from all available genres, i.e. magic squares, pathfinders, arithmetics, crosswords, paintings, logic just to name a few. That is, a range of solving skills should be required to do well in these sets.
- There being a separate World Sudoku Championship held just a few days before WPC, Sudoku and its closest variations should be avoided.
Coordinating puzzle types
The first deliverable for the authors was a list of their planned puzzle type and/or the structure of their set, without creating any actual puzzles at this stage. Once all four plans were received, we collated that into a single sheet. We were trying to ensure that
- The genre variety and novelty range is right
- There is no puzzle type overlap between any two sets within 80P
- There is no puzzle type overlap between 80P sets and the types of puzzles that would play a crucial role in the rounds designed by us, if any
Any duplicates were communicated to the authors as such, and asked them to find a replacement, without revealing who else got priority on that particular puzzle type, e.g. we only sent “don’t include Battleships” types of messages. In fact, this has seen quite a few iterations in some cases. This process ensures that authors do not know any substantial information about the puzzle sets of the other authors ("someone else seems to be considering Battleships" is probably not an overwhelming amount of information).
Although we ended up being lenient and allowed for some exceptions to the overlap constraints above for reasons that include puzzle set theme, sheer puzzle beauty, etc, there is no significant overlap in the four sets as a result.
Testing the sets
Our testing had at least three people from the core team solve each of the sets under competition circumstances. Test solvers were Pal and the three Zoltans, i.e. team HUN-A that finished 4th at last year's WPC. We recorded all our solving times for each of the puzzles and provided these results to the authors in a collated feedback (each author only received feedback about their own set, obviously).
There were instances where some or all of us found certain puzzles to be too difficult to the point of not being too much fun to solve. There were other instances where we felt the puzzle is not suitable for the circumstances. In these cases, we asked authors to adjust and/or replace those puzzles. (I need to point out that all the authors did a very mature and quality job, so it is the minority of the puzzles in any of the set that we had to ask to be tweaked or replaced.)
As a result of these iterations, we feel that the puzzle sets are now balanced on difficulty and all other measures as much as it is reasonable to expect. In particular, the solving times for each of the test solvers fall within a 10% range for each of the sets – this is surely not a lot of data points to go by, but then this is also how we had timed WPC rounds in 2005 and 2011 and 24HPC rounds in many other years and these events were (mostly) reasonably predictable, in how rounds were scored and timed.
In case the difficulty of the sets still turns out to show some differences, that's where the scoring recommendation, details of which are to be posted in a day or two as discussed above, will come into play.
Presentation
To ensure a consistent graphic appearance of puzzles during the entire Championship, the core team has agreed to re-draw all the puzzles that the 80 puzzles’ authors create. We also review the instructions of the puzzles, for a similar reason.
Sunday 11 August 2013
Around the world in 80 puzzles - Introduction
Introduction
This article discusses the concept "Around the world in 80 puzzles" which is the code name for a novel competition structure that will cover a significant part of World Puzzle Championship 2013.Background
WPC authors
Traditionally, all the puzzles of any WPC in the past have been provided by authors mostly from the hosting country (there are some known exceptions here having the odd international contributor, e.g. Eger 2005 or Minsk 2008 and perhaps more, but this does not change the main thrust). This means that a small local team usually ended up designing 200+ puzzles for the two-days event. While this paradigm has been the generally accepted course of event organisation, a number of thoughts have also been raised, including:- The diversity of puzzles, styles and ideas may be limited by the fact that only a handful of people create the entire agenda.
- Some countries that end up hosting multiple WPCs are likely to end up associated with a particular style that may not be well received on subsequent events they organise. This is particularly the case in the actual setup, with Hungary having hosted two WPCs in the recent 8 years and stepping up to do 2013 as well. There are significant concerns around that the WPC in China might be "just another Hungarian WPC".
- Most of the countries, however, may not get to host a WPC over an extended period, which means that individual puzzle designers in those countries may never get a chance to provide any WPC puzzle in their lifetimes.
To address these thoughts, there seems to be a strong case of making the puzzle author crew international to a greater extent.
24-hours Puzzle Championship
The 24-hours Puzzle Championship, which we discuss here as a side note to provide context, is a yearly puzzle event in Hungary that started in 2000 and never missed a year ever since. While the initial few years had 14 local entrants only, the event has seen a tremendous growth, becoming a widely recognised international tournament from 2003, achieving attendance record of 77 puzzlers in 2005 (when held along with WPC) and a sustained track record of 30+ puzzlers ever since.
The tournament consists of thirteen (13) 100-minutes rounds. The puzzle authoring process is largely de-centralised, meaning that each of these 100-minutes puzzle sets are created by different people / teams (there is some level of coordination in place, of course, to ensure a reasonable level of consistency on scoring/difficulty principles). To ensure that authors can also compete (which was of course always a requirement), there are fourteen (14) puzzle sets so that each competitor skips one of them. Authors naturally skip their own set. In terms of scoring, there is a normalisation step across rounds in place that, without going too technical, is designed to ensure that any difference in puzzle set difficulties is accounted for.
Connecting the ideas
In an email discussion that started from the context of appointing a group to provide puzzles for WPC2013, Thomas Snyder brought up the idea of leveraging the experience of 24-hours puzzle championships for WPCs to involve a multitude of authors and thereby addressing the issues above. While the idea of changing the format of the entire WPC at once would sound a little harsh, the justification behind it clearly does hold water. We have chosen to take the idea forward and implement it for WPC 2013. We have chosen to scale it down to a prototype so that it only constitutes a relatively small part of the Championship, with the majority of the event remaining to be held on a more conventional basis. This allows participants to provide a more balanced feedback that may shape future events better.
These thoughts lead us to introduce:
These thoughts lead us to introduce:
Around the world in 80 puzzles
Setup
"Around the world in 80 puzzles" is the code name for a set of four consecutive rounds during 2013 World Puzzle Championship. Some basic facts:
- Each of these rounds will last for sixty (60) minutes, with a 15 minutes break between each of them.
- Each of these rounds will consist of twenty (20) puzzles.
- Every competitor will solve exactly three of these four rounds, and will skip exactly one.
- Puzzles for these rounds will be provided by puzzle authors who are not based in Hungary and therefore are not members of the core puzzle author team. Moreover, the author(s) for these four rounds will be from four different countries.
- The authors, skipping their own puzzle sets, are allowed to participate in the Championship as competitors. This is a key point and is discussed below in detail.
Authors
The selection of puzzle authors was made so that we have puzzle authors who are well known to the puzzle community, they have a proven track record of creating great puzzles and have been established solvers as well, to understand what's required for a WPC. In addition, given the format of this concept, we need to ensure authors are seen as honest and trustworthy competitors with the highest degree of integrity so that this concept is feasible to implement without bringing about questions over the integrity of WPC.
Also, we wanted to ensure that the term "Around the world" in the title actually holds water. While it's impossible to represent every continent, every country or every potential author in just four rounds, our selection ensures that at least on a continent level, the major traditional hubs of puzzle life are actually represented. Specificly, we have invited one author from each of "Americas", "Asia", "Western Europe" and "Eastern Europe". The authors had the choice of inviting additional established puzzle authors from their own countries for co-authoring the set and/or testing: as the list below suggests, all of them have chosen to do so. The list of authors are, in no particular ordering:
Team USA - representing "Americas":
Thomas Snyder (lead), Wei-Hwa Huang, Palmer Mebane
Team India - representing "Asia":
Prasanna Seshadri (lead), Amit Sowani
Team Netherlands - representing "Western Europe":
Bram de Laat (lead), Hans Eendebak, Tim Peeters, Richard Stolk
Team Serbia - representing "Eastern Europe":
Nikola Zivanovic (lead), Branko Ceranic, Cedomir Milanovic, Zoran Tanasic
Also, we wanted to ensure that the term "Around the world" in the title actually holds water. While it's impossible to represent every continent, every country or every potential author in just four rounds, our selection ensures that at least on a continent level, the major traditional hubs of puzzle life are actually represented. Specificly, we have invited one author from each of "Americas", "Asia", "Western Europe" and "Eastern Europe". The authors had the choice of inviting additional established puzzle authors from their own countries for co-authoring the set and/or testing: as the list below suggests, all of them have chosen to do so. The list of authors are, in no particular ordering:
Team USA - representing "Americas":
Thomas Snyder (lead), Wei-Hwa Huang, Palmer Mebane
Team India - representing "Asia":
Prasanna Seshadri (lead), Amit Sowani
Team Netherlands - representing "Western Europe":
Bram de Laat (lead), Hans Eendebak, Tim Peeters, Richard Stolk
Team Serbia - representing "Eastern Europe":
Nikola Zivanovic (lead), Branko Ceranic, Cedomir Milanovic, Zoran Tanasic
Participation of authors as competitors
The puzzle authors above are eligible to enter the Championship as a competitor. As stated above, none of them will get to solve their own puzzles within the competition (obviously). In addition, none of their compatriots will get to solve their puzzles either. However, anybody in the list of names above who chooses to enter the Championship as a competitor will solve the puzzles from the other three teams under competition circumstances. Since this point can trigger some questions, let us explain the concept behind.
In previous Championships, nobody from the puzzle designer teams have entered the competition, but their compatriots were still eligible to compete. The key point of the underlying expectation is that anybody to design a puzzle for the Championship is expected to keep the puzzles they designed secret. This is the fundamental assumption behind the trust model: authors must not reveal any information about their own puzzles to anybody who may get to solve those puzzles in competition circumstances.
"Around the world in 80 puzzles" round uses the very same model of trust and makes the very same assumption. The people above have been asked (and have all committed to agree) that they strictly do not give any information about the puzzles they create to people outside their immediate team (names listed above) and the core organising team (none of whom competes, obviously).
Members of Team USA, for example, are aware of the puzzles they create themselves but they do not know anything about the puzzles e.g. Team India creates. The instructions for the puzzles by the Indian team will be known to Team USA at the time of the publication of the Instructions Booklet, i.e. at the time when everybody else will learn about the instructions. Therefore, the three rounds that potential Team USA members will get to solve in the Championship will be just as new to them as it will be to anybody else getting to solve those rounds.
Naturally, the trust model that is applied here relies on the individuals who play the role of puzzle authors here. This is one reason why we have carefully chosen the authors so that they are all well known and established members of the world puzzle community, and this is also the reason why all their names have been published here above, well in advance of the competition. It should be noted, however, that the expectation against puzzle authors not to reveal information about their puzzles is just identical to the expectation that has been there against all the puzzle authors of past Championships.
In previous Championships, nobody from the puzzle designer teams have entered the competition, but their compatriots were still eligible to compete. The key point of the underlying expectation is that anybody to design a puzzle for the Championship is expected to keep the puzzles they designed secret. This is the fundamental assumption behind the trust model: authors must not reveal any information about their own puzzles to anybody who may get to solve those puzzles in competition circumstances.
"Around the world in 80 puzzles" round uses the very same model of trust and makes the very same assumption. The people above have been asked (and have all committed to agree) that they strictly do not give any information about the puzzles they create to people outside their immediate team (names listed above) and the core organising team (none of whom competes, obviously).
Members of Team USA, for example, are aware of the puzzles they create themselves but they do not know anything about the puzzles e.g. Team India creates. The instructions for the puzzles by the Indian team will be known to Team USA at the time of the publication of the Instructions Booklet, i.e. at the time when everybody else will learn about the instructions. Therefore, the three rounds that potential Team USA members will get to solve in the Championship will be just as new to them as it will be to anybody else getting to solve those rounds.
Naturally, the trust model that is applied here relies on the individuals who play the role of puzzle authors here. This is one reason why we have carefully chosen the authors so that they are all well known and established members of the world puzzle community, and this is also the reason why all their names have been published here above, well in advance of the competition. It should be noted, however, that the expectation against puzzle authors not to reveal information about their puzzles is just identical to the expectation that has been there against all the puzzle authors of past Championships.
Having said all this, we are confident that the integrity of the Championship continues to be at the standard set out by its previous occurrences.
Scoring
Although all four rounds will be of equal length on time, of equal number of puzzles and - hopefully - roughly the same difficulty of puzzles, since different competitors solve a different set of rounds, we cannot simply add up the scores without running the risk of perceived differences in the difficulties of the rounds. Instead, the scoring will be scaled to the actual performance of the solvers, allowing for a round turning out to be more difficult than another one.
In case of the 24 hours championship, for example, a round winner gets 100 points and anyone else gets the percentage of their score relative to the winner. For example if Alice wins a 1000-points puzzle round by a score of 900, Bob scores 810 points and Carol finishes with 540, then Alice gets 100 points (for 100%), Bob gets 90 points (810/900 = 0.9 = 90%) and for Carol the same method yields 60 points (540/900 = 0.6 = 60%).
In case of the 24 hours championship, for example, a round winner gets 100 points and anyone else gets the percentage of their score relative to the winner. For example if Alice wins a 1000-points puzzle round by a score of 900, Bob scores 810 points and Carol finishes with 540, then Alice gets 100 points (for 100%), Bob gets 90 points (810/900 = 0.9 = 90%) and for Carol the same method yields 60 points (540/900 = 0.6 = 60%).
At this event, we will be using something similar to the 24-hours system, although the scores will not be scaled to the round winning scores but something like "10th place" or "average score from 3rd to 7th place" instead. This is designed to ensure that if one competitor runs away from the field by a long way in one of the rounds, then the other competitors solving in that round should not be put at a disadvantage compared to those who were skipping that round. Of course, the competitor winning by a long way will still get rewarded by a very high score (percentage wise it will be higher than 100).
The exact details will be specified later, most likely at the time of publishing the instructions booklet.
The exact details will be specified later, most likely at the time of publishing the instructions booklet.
Conclusion
We (organisers and authors) all look forward to trialling this concept at a World Puzzle Championship. We feel this opens the possibility for future host countries without a sufficient number of puzzle designers to host the event in a few years, also for puzzle designers who do not want to host the entire event to be able to contribute puzzles to a future Championship.For any questions / remarks, please contact Zoltan Nemeth or Gyorgy Istvan.
Subscribe to:
Posts (Atom)