Your Friend the Hypergeometric Distribution
There’s a tool that all game designers should be aware of, which helps you determine the probability of certain things happening.
Let’s say I’ve got a deck that has two key cards that are needed for a combo, and there’s one copy of each in my deck of 60 cards. How often will players draw both of those cards into their starting hand of seven cards? How many cards will they need to draw before they have a 50% chance of getting their combo?
These are questions that it can be important to understand during the design process. You don’t need to use math to do this of course – you can simply play the game lots of times, and get a feel for when things are happening.
But using math can help you dramatically shorten your design cycle. Your initial prototypes will be a lot closer to your vision right out of the gate.
You can do this calculation by hand, treeing out all of the possibilities. But it is really tedious. You can also use a bunch of approximations, like that you’re drawing about 10% of the cards in your deck, so each card has a 10% chance of being in your starting hand. But if you’re really interested in finding out the exact answer, or if you get into more complex situations, there’s a secret weapon that should be in the arsenal of every game designer: Hypergeometric Distribution.
I love the Hypergeometric Distribution. First, it’s got a really cool name. It literally sounds like a super villain’s doomsday weapon. Second, it gives you a formula that’s a lot easier to calculate than treeing out the possible draws. And most importantly, it’s built into Excel. It is super easy to use Excel to determine these probabilities.
The Hypergeometric Distribution can really shortcut the game design process. Let’s say your players are drawing five cards out of a deck of 60 cards, and there are 10 wildcards in the deck. How often will they draw 2 wildcards? Or 4? Or all 5? Or no more than 3? All of these types of questions can be answered with a quick and easy trip to your local spreadsheet.
The Hypergeometric Distribution covers situations where the thing you draw doesn’t get replaced in between pulls. This is almost always the case if you are drawing from a deck of cards, or pulling cubes out of a bag. Each draw changes the probabilities of the next draw – which is one of the reason that calculating the probability step-by-step is such a chore. If I draw an Ace from a deck of cards my chances of drawing another Ace have gone down, and go down even more if I immediately draw a second Ace. This is called a Dependent Probability – the chances of something happening depend on what happened before.
So let’s see how we would solve our wildcard problem above in Excel. Let’s say our deck has 60 cards, and 10 are wildcards. In a starting hand of five cards, what are the chances of drawing exactly two wildcards? Here’s the Hypergeometric formula in Excel:
HYPGEOM.DIST(sample_s, number_sample, population_s, number_pop, cumulative)
Sample_s: The number of successes in the sample – in this case that’s 2. We need to draw two wildcards to be successful.
Number_sample: The size of the sample. 5 in this case – the number of cards you are drawing.
Population_s: The number of successes in the population. – in this case that’s 10. There are ten wildcards in the deck, and drawing one is considered a success.
Number_pop: The population size. For this example, this is 60 – the total number of cards in the deck.
Cumulative: This is a true/false value. If it’s FALSE, this function calculates the probability that EXACTLY that number of successes happens (two, in this example). If it’s TRUE, then it is the probability of the number of successes or FEWER happening (so 0, 1, or 2 successes in this case).
So our function would be:
HYPGEOM.DIST(2, 5, 10, 60, false)
And Excel tells you the answer is 16%. So one out of six times you will draw two wildcards in your hand of five cards.
If we change the Cumulative parameter to true:
HYPGEOM.DIST(2, 5, 10, 60, true)
the answer is now 97% - so it is very likely you will have 2 or fewer wildcards at the start.
You can see where that’s coming from by looking at the chance of having zero wildcards:
HYPGEOM.DIST(0, 5, 10, 60, false) = 39%
And exactly one wildcard:
HYPGEOM.DIST(1, 5, 10, 60, false) = 42%
If you add those to our 16% chance of getting exactly two, you’ll get the 97% value.
If you need to know a number of successes or HIGHER, you can just calculate the “under” percentage and subtract it from 100%. So to determine the chances of getting two or more successes, just calculate the chance of getting 0 or 1 and subtract that from 100%:
HYPGEOM.DIST(1, 5, 10, 60, true) = 81%
So 2 or more successes = 19%.
Remember that the Hypergeometric distribution is based on DEPENDENT probabilities – things that are not replaced. When you draw a card, there are fewer of that card in the deck.
But what if your design involves independent probability? If the events that came before do not impact what happens in the future? This could be card draws where the cards get shuffled back in, or rolling dice. For example, let’s say I’m rolling ten dice, and want to know the chances of getting exactly 3 fives or sixes. Or rolling the same die 10 times. Either way, the rolls are Independent. Unlike drawing an Ace from a deck of cards, if I roll a 5 it doesn’t make it more or less likely I’ll roll a 5 on my next toss.
If you want to calculate that, you need a close relative of the Hypergeometric Distribution called the Binomial Distribution. The Binomial Distribution lets you answer similar questions as the Hypergeometric Distribution with that important distinction – the Binomial Distribution is for Independent events, and Hypergeometric for Dependent events.
Here’s the Excel formula:
BINOM.DIST(number_s, trials, probability_s, cumulative)
This looks really similar to the Hypergeometric formula, just a bit simpler. Let’s look at our above example: Rolling 10 6-sided dice, and we want to know the chances of exactly 3 fives or sixes.
Number_s: The number of successes in trials – We are looking for 3 successes
Trials: The number of independent trials – 10 in this example
Probability_s: The probability of success on each trial – So 1/3 in our case (0.33)
Cumulative: Same as the Hypergeometric. If we want exactly three successes, this should be FALSE. If we want three or fewer, it should be true.
So our formula is:
BINOM.DIST(3, 10, 0.33, FALSE) = 26%
If you’re doing game design – or just want to become a more analytical player – you should really learn how to use the binomial and hypergeometric distributions. They can be a huge timesaver. Play testing will get you where you want to go eventually. But introducing a little math can get you to where you want to go a lot faster.
Note: Google Sheets has the HYPGEOM.DIST and BINOM.DIST functions but does not have the Cumulative parameter. Their formula always gives the probability of getting exactly that number of successes.