2016-08-20

## itertools

This module contains lots of function on sequence-like objects. It has been never easier for iteration.

Iterator is lazy, which will not be generated until called. This is more memory efficient.

import itertools as itls


### Combine: chain()与izip()

chain() takes n iterable objects and combine them together. Let me show you an example.

for i in itls.chain([1,2,3],['a','b','c']):
print i,

1 2 3 a b c


izip() takes n iterable objects and combine them as tuples. Just like zip() but return iterator instead of list.

for i, j in itls.izip([1,2,3],['a','b','c']):
print i, j

1 a
2 b
3 c


### Slice: islice()

islice() takes a iterator and return its slice. Just like the slice operation on list. There are three params start, stop and step.

print "Stop at 5:"
for i in itls.islice(itls.count(),5):
print i,

Stop at 5:
0 1 2 3 4

print "Start at 5, Stop at 10:"
for i in itls.islice(itls.count(),5,10):
print i,

Start at 5, Stop at 10:
5 6 7 8 9

print "By tens to 100:"
for i in itls.islice(itls.count(),0,100,10):
print i,

By tens to 100:
0 10 20 30 40 50 60 70 80 90


### Duplicate: tee()

Just like the tee in Unix. tee() takes a iterator and return n same iterators.

r = itls.islice(itls.count(),4)
i1, i2, i3 = itls.tee(r,3) # i1 and i2, like a copy

for i, j, k in itls.izip(i1,i2,i3):
print i, j, k

0 0 0
1 1 1
2 2 2
3 3 3


One thing to notice is you should be careful when using the original iterator. Check this example to see why.

r = itls.islice(itls.count(),4)
i1, i2 = itls.tee(r)
for i in r:
print 'r:', i
if i > 0:
break
for i in i1:
print 'i1:', i

for i in i2:
print 'i2:', i

r: 0
r: 1
i1: 2
i1: 3
i2: 2
i2: 3


The original iterator consumed 0,1 and will not be generated in i1 and i2.

### Map

imap() function transform iterator. Just like the built in map(). Let’s multiply xrange(5) by 2.

print "Doubles:"
for i in itls.imap(lambda x: 2*x, xrange(5)):
print i,

Doubles:
0 2 4 6 8


imap() can take more than one iterator and map it.

print "Multiples:"
for i in itls.imap(lambda x,y:(x, y, x*y), xrange(5),xrange(5,10)):
print '%d * %d = %d' % i

Multiples:
0 * 5 = 0
1 * 6 = 6
2 * 7 = 14
3 * 8 = 24
4 * 9 = 36


starmap() is kind of similar to imap(), but with small difference. starmap() can parse several params from a tuple, while imap() get several params from multiple iterators.

values = [(0, 5), (1, 6), (2, 7), (3, 8), (4, 9)]
for i in itls.starmap(lambda x,y:(x,y,x*y), values):
print '%d * %d = %d' % i

0 * 5 = 0
1 * 6 = 6
2 * 7 = 14
3 * 8 = 24
4 * 9 = 36


### Create new iterator

count()cycle() and repeat() for iterator generation.

#### count()

Continous integers, with lower bound 0 and no upper bound(upper bound with xrange()).

for i in itls.izip(itls.count(1),['a','b','c']):
print i

(1, 'a')
(2, 'b')
(3, 'c')


#### cycle()

Cycle iterable unlimitted times.

i = 0
for item in itls.cycle(['a','b','c']):
i += 1
if i == 7:
break
print (i, item)

(1, 'a')
(2, 'b')
(3, 'c')
(4, 'a')
(5, 'b')
(6, 'c')


#### repeat()

Repeat n times.

for i in itls.repeat('over-and-over',3):
print i

over-and-over
over-and-over
over-and-over


When we want to add a constant to a sequence, a repeat() and imap() combo is very powerful.

for i,s in itls.izip(itls.count(), itls.repeat('over-and-over',3)):
print i, s

0 over-and-over
1 over-and-over
2 over-and-over

for i in itls.imap(lambda x,y:(x,y,x*y),itls.repeat(2),xrange(5)):
print '%d * %d = %d' % i

2 * 0 = 0
2 * 1 = 2
2 * 2 = 4
2 * 3 = 6
2 * 4 = 8


### Filter

Just like the built in filter() function.

#### dropwhile()

Test the item, if True, drop it and continue; if False, stop dropping and take this element and all the rest.

def should_drop(x):
print 'Testing:', x
return x < 1
for i in itls.dropwhile(should_drop,[ -1, 0, 1, 2, 3, 1, -2 ]):
print 'Yielding:', i

Testing: -1
Testing: 0
Testing: 1
Yielding: 1
Yielding: 2
Yielding: 3
Yielding: 1
Yielding: -2


#### takewhile()

Different from dropwhile(). Test the item, if True, take it and continue; if False, stop and do not take this one and the rest.

def should_take(x):
print 'Testing:', x
return x < 2
for i in itls.takewhile(should_take,[ -1, 0, 1, 2, 3, 4, 1, -2 ]):
print 'Yielding:', i

Testing: -1
Yielding: -1
Testing: 0
Yielding: 0
Testing: 1
Yielding: 1
Testing: 2


#### ifilter()

dropwhile() and takewhile() apply only on part of all elements. But ifilter() applies to all elements. ifilterfalse is the same but only take those with False returned.

def check_item(x):
print 'Testing:', x
return x < 1
for i in itls.ifilter(check_item, [ -1, 0, 1, 2, 3, -2 ]):
print 'Yielding:', i

Testing: -1
Yielding: -1
Testing: 0
Yielding: 0
Testing: 1
Testing: 2
Testing: 3
Testing: -2
Yielding: -2


### Group iterators

groupby(iterable[, keyfunc]) Create an iterator which returns(key, sub-iterator) grouped by each value of key(value)

Group base on key to sub-iterator.

things = [("animal", "bear"), ("animal", "duck"),
("plant", "cactus"), ("vehicle", "speed boat"), ("vehicle", "school bus")]


groupby() takes two params, one is the data to group and another is the function to group it with.

for key, group in itls.groupby(things, lambda x: x[0]):
print key, group

animal <itertools._grouper object at 0x10bce2150>
plant <itertools._grouper object at 0x10bce2190>
vehicle <itertools._grouper object at 0x10bce2150>


As can seen from the result, three sub-iterator are returned and we can use another level loop on sub-iterator.

for key, group in itls.groupby(things, lambda x:x[0]):
for thing in group:
print "A %s is a %s." % (thing[1], key)
print ""

A bear is a animal.
A duck is a animal.

A cactus is a plant.

A speed boat is a vehicle.
A school bus is a vehicle.


But one thing to notice is that before group, make sure iterable is sorted base on key. Because new group will be created if different key encountered.

things = [("animal", "bear"), ("plant", "cactus"), ("animal", "duck")]

for key, group in itls.groupby(things, lambda x: x[0]):
print key, group

animal <itertools._grouper object at 0x10bce2410>
plant <itertools._grouper object at 0x10bce2490>
animal <itertools._grouper object at 0x10bce2410>


We expect two groups but got three because it is unsorted.

new_things = sorted(things,key=lambda x: x[0])
for key, group in itls.groupby(new_things, lambda x:x[0]):
print key, group

animal <itertools._grouper object at 0x10bce2350>
plant <itertools._grouper object at 0x10bce2410>


Now it seems like right.