Home » Python » A weighted version of random.choice

A weighted version of random.choice

Questions:

I needed to write a weighted version of random.choice (each element in the list has a different probability for being selected). This is what I came up with:

def weightedChoice(choices):
"""Like random.choice, but each element can have a different chance of
being selected.

choices can be any iterable containing iterables with two items each.
Technically, they can have more than two items, the rest will just be
ignored.  The first item is the thing being chosen, the second item is
its weight.  The weights can be any numeric values, what matters is the
relative differences between them.
"""
space = {}
current = 0
for choice, weight in choices:
if weight > 0:
space[current] = choice
current += weight
rand = random.uniform(0, current)
for key in sorted(space.keys() + [current]):
if rand < key:
return choice
choice = space[key]
return None

This function seems overly complex to me, and ugly. I’m hoping everyone here can offer some suggestions on improving it or alternate ways of doing this. Efficiency isn’t as important to me as code cleanliness and readability.

def weighted_choice(choices):
total = sum(w for c, w in choices)
r = random.uniform(0, total)
upto = 0
for c, w in choices:
if upto + w >= r:
return c
upto += w
assert False, "Shouldn't get here"

Questions:

Since version 1.7.0, NumPy has a choice function that supports probability distributions.

from numpy.random import choice
draw = choice(list_of_candidates, number_of_items_to_pick, p=probability_distribution)

Note that probability_distribution is a sequence in the same order of list_of_candidates. You can also use the keyword replace=False to change the behavior so that drawn items are not replaced.

Questions:
1. Arrange the weights into a
cumulative distribution.
2. Use random.random() to pick a random
float 0.0 <= x < total.
3. Search the
distribution using bisect.bisect as
shown in the example at http://docs.python.org/dev/library/bisect.html#other-examples.
from random import random
from bisect import bisect

def weighted_choice(choices):
values, weights = zip(*choices)
total = 0
cum_weights = []
for w in weights:
total += w
cum_weights.append(total)
x = random() * total
i = bisect(cum_weights, x)
return values[i]

>>> weighted_choice([("WHITE",90), ("RED",8), ("GREEN",2)])
'WHITE'

If you need to make more than one choice, split this into two functions, one to build the cumulative weights and another to bisect to a random point.

Questions:

Since Python3.6 there is a method choices from random module.

Python 3.6.1 (v3.6.1:69c0db5050, Mar 21 2017, 01:21:04)
IPython 6.0.0 -- An enhanced Interactive Python. Type '?' for help.

In : import random

In : population = [['a','b'], ['b','a'], ['c','b']]

In : list_of_prob = [0.2, 0.2, 0.6]

In : population = random.choices(population, weights=list_of_prob, k=10)

In : population
Out:
[['c', 'b'],
['c', 'b'],
['b', 'a'],
['c', 'b'],
['c', 'b'],
['b', 'a'],
['c', 'b'],
['b', 'a'],
['c', 'b'],
['c', 'b']]

And people also mentioned that there is numpy.random.choice which support weights, BUT it don’t support 2d arrays, and so on.

So, basically you can get whatever you like with builtin random.choices if you have 3.6.x Python.

Questions:

If you don’t mind using numpy, you can use numpy.random.choice.

For example:

import numpy

items  = [["item1", 0.2], ["item2", 0.3], ["item3", 0.45], ["item4", 0.05]
elems = [i for i in items]
probs = [i for i in items]

trials = 1000
results =  * len(items)
for i in range(trials):
res = numpy.random.choice(items, p=probs)  #This is where the item is selected!
results[items.index(res)] += 1
results = [r / float(trials) for r in results]
print "item\texpected\tactual"
for i in range(len(probs)):
print "%s\t%0.4f\t%0.4f" % (items[i], probs[i], results[i])

If you know how many selections you need to make in advance, you can do it without a loop like this:

numpy.random.choice(items, trials, p=probs)

Questions:

Crude, but may be sufficient:

import random
weighted_choice = lambda s : random.choice(sum(([v]*wt for v,wt in s),[]))

Does it work?

# define choices and relative weights
choices = [("WHITE",90), ("RED",8), ("GREEN",2)]

# initialize tally dict
tally = dict.fromkeys(choices, 0)

# tally up 1000 weighted choices
for i in xrange(1000):
tally[weighted_choice(choices)] += 1

print tally.items()

Prints:

[('WHITE', 904), ('GREEN', 22), ('RED', 74)]

Assumes that all weights are integers. They don’t have to add up to 100, I just did that to make the test results easier to interpret. (If weights are floating point numbers, multiply them all by 10 repeatedly until all weights >= 1.)

weights = [.6, .2, .001, .199]
while any(w < 1.0 for w in weights):
weights = [w*10 for w in weights]
weights = map(int, weights)

Questions:

If you have a weighted dictionary instead of a list you can write this

items = { "a": 10, "b": 5, "c": 1 }
random.choice([k for k in items for dummy in range(items[k])])

Note that [k for k in items for dummy in range(items[k])] produces this list ['a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'c', 'b', 'b', 'b', 'b', 'b']

Questions:

Here’s is the version that is being included in the standard library for Python 3.6:

import itertools as _itertools
import bisect as _bisect

class Random36(random.Random):
"Show the code included in the Python 3.6 version of the Random class"

def choices(self, population, weights=None, *, cum_weights=None, k=1):
"""Return a k sized list of population elements chosen with replacement.

If the relative weights or cumulative weights are not specified,
the selections are made with equal probability.

"""
random = self.random
if cum_weights is None:
if weights is None:
_int = int
total = len(population)
return [population[_int(random() * total)] for i in range(k)]
cum_weights = list(_itertools.accumulate(weights))
elif weights is not None:
raise TypeError('Cannot specify both weights and cumulative weights')
if len(cum_weights) != len(population):
raise ValueError('The number of weights does not match the population')
bisect = _bisect.bisect
total = cum_weights[-1]
return [population[bisect(cum_weights, random() * total)] for i in range(k)]

Questions:

As of Python v3.6, random.choices could be used to return a list of elements of specified size from the given population with optional weights.

random.choices(population, weights=None, *, cum_weights=None, k=1)

• population : list containing unique observations. (If empty, raises IndexError)

• weights : More precisely relative weights required to make selections.

• cum_weights : cumulative weights required to make selections.

• k : size(len) of the list to be outputted. (Default len()=1)

Few Caveats:

1) It makes use of weighted sampling with replacement so the drawn items would be later replaced. The values in the weights sequence in itself do not matter, but their relative ratio does.

Unlike np.random.choice which can only take on probabilities as weights and also which must ensure summation of individual probabilities upto 1 criteria, there are no such regulations here. As long as they belong to numeric types (int/float/fraction except Decimal type) , these would still perform.

>>> import random
# weights being integers
>>> random.choices(["white", "green", "red"], [12, 12, 4], k=10)
['green', 'red', 'green', 'white', 'white', 'white', 'green', 'white', 'red', 'white']
# weights being floats
>>> random.choices(["white", "green", "red"], [.12, .12, .04], k=10)
['white', 'white', 'green', 'green', 'red', 'red', 'white', 'green', 'white', 'green']
# weights being fractions
>>> random.choices(["white", "green", "red"], [12/100, 12/100, 4/100], k=10)
['green', 'green', 'white', 'red', 'green', 'red', 'white', 'green', 'green', 'green']

2) If neither weights nor cum_weights are specified, selections are made with equal probability. If a weights sequence is supplied, it must be the same length as the population sequence.

Specifying both weights and cum_weights raises a TypeError.

>>> random.choices(["white", "green", "red"], k=10)
['white', 'white', 'green', 'red', 'red', 'red', 'white', 'white', 'white', 'green']

3) cum_weights are typically a result of itertools.accumulate function which are really handy in such situations.

Internally, the relative weights are converted to cumulative weights
before making selections, so supplying the cumulative weights saves
work.

So, either supplying weights=[12, 12, 4] or cum_weights=[12, 24, 28] for our contrived case produces the same outcome and the latter seems to be more faster / efficient.

Questions:

I’d require the sum of choices is 1, but this works anyway

def weightedChoice(choices):
# Safety check, you can remove it
for c,w in choices:
assert w >= 0

tmp = random.uniform(0, sum(c for c,w in choices))
for choice,weight in choices:
if tmp < weight:
return choice
else:
tmp -= weight
raise ValueError('Negative values in input')

Questions:

A general solution:

import random
def weighted_choice(choices, weights):
total = sum(weights)
treshold = random.uniform(0, total)
for k, weight in enumerate(weights):
total -= weight
if total < treshold:
return choices[k]

Questions:
import numpy as np
w=np.array([ 0.4,  0.8,  1.6,  0.8,  0.4])
np.random.choice(w, p=w/sum(w))

Questions:

I’m probably too late to contribute anything useful, but here’s a simple, short, and very efficient snippet:

def choose_index(probabilies):
cmf = probabilies
choice = random.random()
for k in xrange(len(probabilies)):
if choice <= cmf:
return k
else:
cmf += probabilies[k+1]

No need to sort your probabilities or create a vector with your cmf, and it terminates once it finds its choice. Memory: O(1), time: O(N), with average running time ~ N/2.

If you have weights, simply add one line:

def choose_index(weights):
probabilities = weights / sum(weights)
cmf = probabilies
choice = random.random()
for k in xrange(len(probabilies)):
if choice <= cmf:
return k
else:
cmf += probabilies[k+1]

Questions:

If your list of weighted choices is relatively static, and you want frequent sampling, you can do one O(N) preprocessing step, and then do the selection in O(1), using the functions in this related answer.

# run only when `choices` changes.
preprocessed_data = prep(weight for _,weight in choices)

# O(1) selection
value = choices[sample(preprocessed_data)]

Questions:

I looked the pointed other thread and came up with this variation in my coding style, this returns the index of choice for purpose of tallying, but it is simple to return the string ( commented return alternative):

import random
import bisect

try:
range = xrange
except:
pass

def weighted_choice(choices):
total, cumulative = 0, []
for c,w in choices:
total += w
cumulative.append((total, c))
r = random.uniform(0, total)
# return index
return bisect.bisect(cumulative, (r,))
# return item string
#return choices[bisect.bisect(cumulative, (r,))]

# define choices and relative weights
choices = [("WHITE",90), ("RED",8), ("GREEN",2)]

tally = [0 for item in choices]

n = 100000
# tally up n weighted choices
for i in range(n):
tally[weighted_choice(choices)] += 1

print([t/sum(tally)*100 for t in tally])

Questions:

Here is another version of weighted_choice that uses numpy. Pass in the weights vector and it will return an array of 0’s containing a 1 indicating which bin was chosen. The code defaults to just making a single draw but you can pass in the number of draws to be made and the counts per bin drawn will be returned.

If the weights vector does not sum to 1, it will be normalized so that it does.

import numpy as np

def weighted_choice(weights, n=1):
if np.sum(weights)!=1:
weights = weights/np.sum(weights)

draws = np.random.random_sample(size=n)

weights = np.cumsum(weights)
weights = np.insert(weights,0,0.0)

counts = np.histogram(draws, bins=weights)
return(counts)