Building a Better Beer Competition

The Oregon Beer Awards has always been unusual. Rather than following the BJCP or GABF style guidelines, organizers have reorganized the categories by type rather than style. This year’s slate, which featured the most expansive number of categories, nudged up to 31 (including three fresh-hop categories), for a total of 93 medals awarded. That’s still less than a third the 97 styles evaluated and 270 medals awarded at last year’s GABF. At the OBA, instead of separate categories for helles and festbier, to take one example, brewers submit those styles to the Light German and European Lagers category. I happened to judge that one, and in the finals, we had five styles and six beers. In Oregon, the OBAs are a really big deal to breweries, and one of the reasons is because it is selective and therefore challenging to win a medal.

This year, the competition debuted a new feature to try to temper some of the quirks human judges and random sampling bring. Instead of a single elimination, every beer went to two tables in the first round. If judges agreed on a beer, it went forward. If, however, the two tables sent different beers forward, the disputed choices were sent to a table of a third group of judges who could either eliminate or advance them. I’ve always imagined that beer competitions were necessarily imprecise; they’re great at sorting beers into tiers, but in terms of identifying “best,” probably more random than organizers would like to admit. After this experiment, I may have to revise my impressions. Not only are beer competitions less random than I thought, but adding this feature helps improve agreement even more. It’s a clever technique and yielded some big dividends.

 
 

Handling Variation

If judging beer was a perfectly precise process, the same ales and lagers submitted to entirely different judges would result in the same winners. Two factors damage the precision, and the OBA’s new approach targets both. First, obviously, is the differences among judges. Though they try to be as objective as possible, humans have different hardware and different approaches to evaluation. That’s a necessary complication in a process in which the human capacity to discern the subjective qualities of harmony, dynamism, composition, and excellence is ultimately the goal. Otherwise, everyone could just submit lab results and be done with it. The second challenge is the randomness of sampling. In a category with sixty beers, for example, what if the ten strongest all happen to end up in the same preliminary flight? Since samples are divided randomly, some flights may be loaded, while others relatively weak.

Breakside Brewery’s Ben Edmunds is the longtime lead for the competition, and he told me why organizers tried the experiment this year. “I’ve personally wondered how often there are early errors that may have an outsize impact on competition.” Organizers thought the new method would give slightly offbeat beers an extra look, and it would help allay the sampling problem. And it did—ultimately, some of the beers that went into arbitration did win medals.

(A quick note on the logistics. In larger categories, the samples went through a preliminary, second, and medal round. In categories with fewer entries, they went straight from a prelim round to a medal round. Organizers only used the two-table system in the first round. Beers making it to a semifinals or finals round were judged by a single table of judges. In the end, about a third of the beers passed out of the arbitration round—what organizers took to calling “Last Chance Kitchen” after the Top Chef show. Those second-chance beers then joined unanimous choices in later rounds.)

Results of the Experiment

In the first round, judges evaluated around a dozen beers and selected the three best to send to a semifinals or finals round. That’s two tables of three judges evaluating twelve beers. How often did all six judges select the same beers? I would have guessed around 50% of the time, which is still a lot more than random chance. In fact, they did a lot better than that. In judging ~1,100 beers over two weekends, judges were unanimous 72% of the time. And they agreed at consistent levels: 73% on weekend one, 71% on weekend two; and in nearly every category the deviations were minor.

But even that somewhat understates the judges’ success. As Ben pointed out, “It’s important to recognize that judges are agreeing not just on which beers to advance, but which beers to eliminate.” Indeed, if you look at those beers that weren’t unanimous to start with but ended up in a medal round, the total was just 7% of the whole sample. In other words, judges were unanimous on 93% of the beers. The accuracy improved in those categories with a lot of beers evaluated over the course of three rounds. Seeing these results, Ben argued that “this [approach] isn’t an indictment of single-elimination competitions. Quite the opposite.”

Yet from a selection standpoint, those small, early disagreements resulted in substantial differences in who won medals. Those second-chance beers were less likely to win a medal than unanimously-selected beers, about 40% lower across the competition. Nevertheless, a number of them did beat the odds and end up with a medal. Ben was looking at just the 84 winners from recent portion of the competition, excluding the nine that won last fall in the fresh-hop category. Of those 84, nineteen, or 22%, were second-chance beers. If this seems confusing because of the overall accuracy I just mentioned, it’s because we’ve switched the denominator from all beers to just those that made it to second and third rounds. (Ben called this an “optics gap.”) In other words, small errors at the fat side of the funnel can result in larger errors down the line. For that reason, even when judges agree 93% of the time, the results will be significantly improved with the double-table system.

There’s no single “best” way to run a competition. In the Australian International Beer Awards, to cite a contrasting approach, judges assign beers a point total, and gold, silver, and bronze medals are handed out to all the beers scoring above certain benchmark targets. Outside the US, competitions organize categories very differently than the BJCP and GABF. These differences are healthy and useful for the people they’re mainly serving—the brewers themselves. Breweries may tout wins in marketing materials, but in most cases competitions don’t drive sales. The biggest benefit is within the brewery, a confirmation by a neutral panel of judges that you’re doing a great job.

In a world of different kinds of beer competitions, this double-table approach has some big upsides with few downsides. It didn’t make the competition more difficult organizationally, but it doubled the amount of feedback a brewery received. At the OBA, each session is recorded, so breweries can now listen to two sets of judges deliberate each of their entries. Having set a first-year benchmark of 72% on judge agreement, the OBAs now have a measure to see how well judges are doing at future competitions. Finally, it helps organizers see how well the categories are working. At this year’s OBA, one style was a low outlier in terms of judge agreement. (Maybe flavored beers?—it doesn’t really matter.) To Ben, that suggested the guidelines or perhaps even the category itself needed to be refined. Using a two-table approach becomes another tool in evaluating the competition.

Based on this year’s success, Ben plans to bring the two-table method back next year. Stay tuned for results on that.

Oregon Beer Awards by the Numbers

At the end of the day, competitions are all about rewarding excellence. So who stood out in 2022? I was surprised to see how top-heavy things were when I crunched the numbers last year. The top five breweries won 40% of the total medal count, and the top ten accounted for 60%. This year, across all 31 categories, a sizable 48 breweries took home one of the 92 available medals (one medal was rescinded because a brewery entered a beer brewed out of state). The top five won just above a third this year, and the top ten around half. Results were similar, but diversity was up—a good sign of health among Oregon’s breweries.

In terms of total medal counts, the big winners were Breakside (12), 10 Barrel (7), and pFriem and Wayfinder (5), though a few other breweries had good years. Newcomer ForeLand won two golds, Grains of Wrath dominated both a lager category (gold-silver) and a dark ale category (silver-bronze). Great Notion, famous for hazies and pastry beers, won two golds and a silver in those categories. Alesong continues to shine in barrel-aged categories, winning four medals (or 3.5 if you give Deschutes half-credit for their winning collab).

In terms of those medals in which people take special interest, here are your winners. (For the full list, see The New School).

  • Pilsner: pFriem, winning gold for the third year in a row. (Whoa!)

  • IPA: Breakside Wanderjack

  • Hazy IPA: Great Notion Love and Ritual

  • DIPA: Ruse Interpreter

  • Mixed culture: Nebuleus Yessitka spruce (Who?? See below)

Each year it’s great to see which darkhorse breweries won medals. Sometimes it signals the arrival of an excellent new brewery, sometimes just a good year for a lesser-known brewery. This year, no horse was darker than Nebuleus, a brewery so small it barely exists. Nevertheless, this maker of mixed-culture beers snagged two medals. Perhaps it’s time for an investor to help them scale up! A new gluten-free brewery, Mutantis, snagged a bronze, while Grants Pass-based Weekend Beer earned silver in sour ales. Finally, one of the big surprises was Mt. View Brewing, located just couple miles down the road from Solera in Parkdale. The two-year-old brewery managed to pick up a silver in the hotly-contested hazy IPA category.

As always, congrats to the winners and the organizers. It’s an excellent competition.

Jeff Alworth5 Comments