itertools
This module contains lots of function on sequence-like objects. It has been never easier for iteration.
Iterator is lazy, which will not be generated until called. This is more memory efficient.
import itertools as itls
Combine: chain()与izip()
chain() takes n iterable objects and combine them together. Let me show you an example.
for i in itls.chain([1,2,3],['a','b','c']):
print i,
1 2 3 a b c
izip() takes n iterable objects and combine them as tuples. Just like zip() but return iterator instead of list.
for i, j in itls.izip([1,2,3],['a','b','c']):
print i, j
1 a
2 b
3 c
Slice: islice()
islice() takes a iterator and return its slice. Just like the slice operation on list. There are three params start, stop and step.
print "Stop at 5:"
for i in itls.islice(itls.count(),5):
print i,
Stop at 5:
0 1 2 3 4
print "Start at 5, Stop at 10:"
for i in itls.islice(itls.count(),5,10):
print i,
Start at 5, Stop at 10:
5 6 7 8 9
print "By tens to 100:"
for i in itls.islice(itls.count(),0,100,10):
print i,
By tens to 100:
0 10 20 30 40 50 60 70 80 90
Duplicate: tee()
Just like the tee in Unix. tee() takes a iterator and return n same iterators.
r = itls.islice(itls.count(),4)
i1, i2, i3 = itls.tee(r,3) # i1 and i2, like a copy
for i, j, k in itls.izip(i1,i2,i3):
print i, j, k
0 0 0
1 1 1
2 2 2
3 3 3
One thing to notice is you should be careful when using the original iterator. Check this example to see why.
r = itls.islice(itls.count(),4)
i1, i2 = itls.tee(r)
for i in r:
print 'r:', i
if i > 0:
break
for i in i1:
print 'i1:', i
for i in i2:
print 'i2:', i
r: 0
r: 1
i1: 2
i1: 3
i2: 2
i2: 3
The original iterator consumed 0,1 and will not be generated in i1 and i2.
Map
imap() function transform iterator. Just like the built in map(). Let’s multiply xrange(5) by 2.
print "Doubles:"
for i in itls.imap(lambda x: 2*x, xrange(5)):
print i,
Doubles:
0 2 4 6 8
imap() can take more than one iterator and map it.
print "Multiples:"
for i in itls.imap(lambda x,y:(x, y, x*y), xrange(5),xrange(5,10)):
print '%d * %d = %d' % i
Multiples:
0 * 5 = 0
1 * 6 = 6
2 * 7 = 14
3 * 8 = 24
4 * 9 = 36
starmap() is kind of similar to imap(), but with small difference. starmap() can parse several params from a tuple, while imap() get several params from multiple iterators.
values = [(0, 5), (1, 6), (2, 7), (3, 8), (4, 9)]
for i in itls.starmap(lambda x,y:(x,y,x*y), values):
print '%d * %d = %d' % i
0 * 5 = 0
1 * 6 = 6
2 * 7 = 14
3 * 8 = 24
4 * 9 = 36
Create new iterator
count(),cycle() and repeat() for iterator generation.
count()
Continous integers, with lower bound 0 and no upper bound(upper bound with xrange()).
for i in itls.izip(itls.count(1),['a','b','c']):
print i
(1, 'a')
(2, 'b')
(3, 'c')
cycle()
Cycle iterable unlimitted times.
i = 0
for item in itls.cycle(['a','b','c']):
i += 1
if i == 7:
break
print (i, item)
(1, 'a')
(2, 'b')
(3, 'c')
(4, 'a')
(5, 'b')
(6, 'c')
repeat()
Repeat n times.
for i in itls.repeat('over-and-over',3):
print i
over-and-over
over-and-over
over-and-over
When we want to add a constant to a sequence, a repeat() and imap() combo is very powerful.
for i,s in itls.izip(itls.count(), itls.repeat('over-and-over',3)):
print i, s
0 over-and-over
1 over-and-over
2 over-and-over
for i in itls.imap(lambda x,y:(x,y,x*y),itls.repeat(2),xrange(5)):
print '%d * %d = %d' % i
2 * 0 = 0
2 * 1 = 2
2 * 2 = 4
2 * 3 = 6
2 * 4 = 8
Filter
Just like the built in filter() function.
dropwhile()
Test the item, if True, drop it and continue; if False, stop dropping and take this element and all the rest.
def should_drop(x):
print 'Testing:', x
return x < 1
for i in itls.dropwhile(should_drop,[ -1, 0, 1, 2, 3, 1, -2 ]):
print 'Yielding:', i
Testing: -1
Testing: 0
Testing: 1
Yielding: 1
Yielding: 2
Yielding: 3
Yielding: 1
Yielding: -2
takewhile()
Different from dropwhile(). Test the item, if True, take it and continue; if False, stop and do not take this one and the rest.
def should_take(x):
print 'Testing:', x
return x < 2
for i in itls.takewhile(should_take,[ -1, 0, 1, 2, 3, 4, 1, -2 ]):
print 'Yielding:', i
Testing: -1
Yielding: -1
Testing: 0
Yielding: 0
Testing: 1
Yielding: 1
Testing: 2
ifilter()
dropwhile() and takewhile() apply only on part of all elements. But ifilter() applies to all elements. ifilterfalse is the same but only take those with False returned.
def check_item(x):
print 'Testing:', x
return x < 1
for i in itls.ifilter(check_item, [ -1, 0, 1, 2, 3, -2 ]):
print 'Yielding:', i
Testing: -1
Yielding: -1
Testing: 0
Yielding: 0
Testing: 1
Testing: 2
Testing: 3
Testing: -2
Yielding: -2
Group iterators
groupby(iterable[, keyfunc])Create an iterator which returns(key, sub-iterator) grouped by each value of key(value)
Group base on key to sub-iterator.
things = [("animal", "bear"), ("animal", "duck"),
("plant", "cactus"), ("vehicle", "speed boat"), ("vehicle", "school bus")]
groupby() takes two params, one is the data to group and another is the function to group it with.
for key, group in itls.groupby(things, lambda x: x[0]):
print key, group
animal <itertools._grouper object at 0x10bce2150>
plant <itertools._grouper object at 0x10bce2190>
vehicle <itertools._grouper object at 0x10bce2150>
As can seen from the result, three sub-iterator are returned and we can use another level loop on sub-iterator.
for key, group in itls.groupby(things, lambda x:x[0]):
for thing in group:
print "A %s is a %s." % (thing[1], key)
print ""
A bear is a animal.
A duck is a animal.
A cactus is a plant.
A speed boat is a vehicle.
A school bus is a vehicle.
But one thing to notice is that before group, make sure iterable is sorted base on key. Because new group will be created if different key encountered.
things = [("animal", "bear"), ("plant", "cactus"), ("animal", "duck")]
for key, group in itls.groupby(things, lambda x: x[0]):
print key, group
animal <itertools._grouper object at 0x10bce2410>
plant <itertools._grouper object at 0x10bce2490>
animal <itertools._grouper object at 0x10bce2410>
We expect two groups but got three because it is unsorted.
new_things = sorted(things,key=lambda x: x[0])
for key, group in itls.groupby(new_things, lambda x:x[0]):
print key, group
animal <itertools._grouper object at 0x10bce2350>
plant <itertools._grouper object at 0x10bce2410>
Now it seems like right.
