Home » Python » Filtering a list based on a list of booleans

Filtering a list based on a list of booleans

Posted by: admin November 29, 2017 Leave a comment

Questions:

I have a list of values which I need to filter given the values in a list of booleans:

list_a = [1, 2, 4, 6]
filter = [True, False, True, False]

I generate a new filtered list with the following line:

filtered_list = [i for indx,i in enumerate(list_a) if filter[indx] == True]

which results in:

print filtered_list
[1,4]

The line works but looks (to me) a bit overkill and I was wondering if there was a simpler way to achieve the same.


Advices

Summary of two good advices given in the answers below:

1- Don’t name a list filter like I did because it is a built-in function.

2- Don’t compare things to True like I did with if filter[idx]==True.. since it’s unnecessary. Just using if filter[idx] is enough.

Answers:

You’re looking for itertools.compress:

>>> from itertools import compress
>>> list_a = [1, 2, 4, 6]
>>> fil = [True, False, True, False]
>>> list(compress(list_a, fil))
[1, 4]

Timing comparisons(py3.x):

>>> list_a = [1, 2, 4, 6]
>>> fil = [True, False, True, False]
>>> %timeit list(compress(list_a, fil))
100000 loops, best of 3: 2.58 us per loop
>>> %timeit [i for (i, v) in zip(list_a, fil) if v]  #winner
100000 loops, best of 3: 1.98 us per loop

>>> list_a = [1, 2, 4, 6]*100
>>> fil = [True, False, True, False]*100
>>> %timeit list(compress(list_a, fil))              #winner
10000 loops, best of 3: 24.3 us per loop
>>> %timeit [i for (i, v) in zip(list_a, fil) if v]
10000 loops, best of 3: 82 us per loop

>>> list_a = [1, 2, 4, 6]*10000
>>> fil = [True, False, True, False]*10000
>>> %timeit list(compress(list_a, fil))              #winner
1000 loops, best of 3: 1.66 ms per loop
>>> %timeit [i for (i, v) in zip(list_a, fil) if v] 
100 loops, best of 3: 7.65 ms per loop

Don’t use filter as a variable name, it is a built-in function.

Questions:
Answers:

With numpy:

In [128]: list_a = np.array([1, 2, 4, 6])
In [129]: filter = np.array([True, False, True, False])
In [130]: list_a[filter]

Out[130]: array([1, 4])

or see Alex Szatmary’s answer if list_a can be a numpy array but not filter

Numpy usually gives you a big speed boost as well

In [133]: list_a = [1, 2, 4, 6]*10000
In [134]: fil = [True, False, True, False]*10000
In [135]: list_a_np = np.array(list_a)
In [136]: fil_np = np.array(fil)

In [139]: %timeit list(itertools.compress(list_a, fil))
1000 loops, best of 3: 625 us per loop

In [140]: %timeit list_a_np[fil_np]
10000 loops, best of 3: 173 us per loop

Questions:
Answers:

Like so:

filtered_list = [i for (i, v) in zip(list_a, filter) if v]

Using zip is the ‘pythonic’ way to iterate over multiple sequences in parallel, without needing any indexing. Using itertools for such a simple case is a bit overkill …

One thing you do in your example you should really stop doing is comparing things to True, this is usually not necessary. Instead of if filter[idx]==True: ..., you can simply write if filter[idx]: ....

Questions:
Answers:

To do this using numpy, ie, if you have an array, a, instead of list_a:

a = np.array([1, 2, 4, 6])
my_filter = np.array([True, False, True, False], dtype=bool)
a[my_filter]
> array([1, 4])