collections
collections
module support more data type beyond the built in python data type. LikeOrderedDict
, defaultdict
, namedtuple
, deque
, counter
etc. Simple but powerful.
import collections
OrderedDict
As it’s name says, dictionary is no longer random. Build in python dictionary do not remember the order of element inseration. It’s order is random when traversing, but OrderedDict
is different.
Traversing
print 'Regular dictionary:'
d = {}
d['a'] = 'A'
d['b'] = 'B'
d['c'] = 'C'
for key, value in d.items():
print key, value
Regular dictionary:
a A
c C
b B
print 'OrderedDict:'
d = collections.OrderedDict()
d['a'] = 'A'
d['b'] = 'B'
d['c'] = 'C'
for key, value in d.items():
print key, value
OrderedDict:
a A
b B
c C
Equal test
When comparing two dictionaries, OrderedDict
also take order into consideration, not only elements.
print 'dict :',
d1 = {}
d1['a'] = 'A'
d1['b'] = 'B'
d1['c'] = 'C'
d2 = {}
d2['b'] = 'B'
d2['a'] = 'A'
d2['c'] = 'C'
print d1 == d2
dict : True
print 'OrderedDict:',
d1 = collections.OrderedDict()
d1['a'] = 'A'
d1['b'] = 'B'
d1['c'] = 'C'
d2 = collections.OrderedDict()
d2['b'] = 'B'
d2['a'] = 'A'
d2['c'] = 'C'
print d1 == d2
OrderedDict: False
defaultdict
For ordinary dictionary, when you access a key that do not exist, then python would throw an error. But with defaultdict
, a default value can be defined in this case. It’s very useful especially when operation like agg is needed. defaultdict
receive one param default_factory
, which return a value or list
(return [ ]) set(return set())
or int(return 0)
, examples are easier to understand.
defaultdict
inherits from dict
, and add a __missing__(key)
method to deal with KeyError
exception.
def default_factory():
return 'This is default string value'
d = collections.defaultdict(default_factory)
print d['foo']
This is default string value
There is no key ‘foo’ in the dictionary, but we can access it and get a value.
list
Set default_factory
as list
, and we can easily group series of key-value pairs. By default, an empty list will be returned for non-exist key.
s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
d = collections.defaultdict(list)
for k, v in s:
d[k].append(v)
# simpler and faster than d.setdefault(k, []).append(v)
d.items()
[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]
int
It is useful for counting the number of key occurence.
s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
d = collections.defaultdict(int)
for k, v in s:
d[k] += 1
d.items()
[('blue', 2), ('red', 1), ('yellow', 2)]
s = 'mississippi'
d = collections.defaultdict(int)
for k in s:
d[k] += 1
d.items()
[('i', 4), ('p', 2), ('s', 4), ('m', 1)]
set
Just like list
, but with unique values.
s = [('red', 1), ('blue', 2), ('red', 3), ('blue', 4), ('red', 1), ('blue', 4)]
d = collections.defaultdict(set)
for k, v in s:
d[k].add(v)
d.items()
[('blue', {2, 4}), ('red', {1, 3})]
namedtuple
Default tuple
is indexed with number, but with namedtuple
, index with name is also possible. It is useful in cases that tuple is used in a place that is far from where it is created.
bob = ('Bob', 30, 'male')
print 'Representation:', bob
jane = ('Jane', 29, 'female')
print '\nField by index:', jane[0]
print '\nFields by index:'
for p in [ bob, jane ]:
print '%s is a %d year old %s' % p
Representation: ('Bob', 30, 'male')
Field by index: Jane
Fields by index:
Bob is a 30 year old male
Jane is a 29 year old female
# define namedtuple
Person = collections.namedtuple('Person','name age gender')
print 'Type of Person:', type(Person)
bob = Person(name='Bob', age=30, gender='male')
print '\nRepresentation:', bob
bob = Person('Bob',30,'male') # also supported
print 'Representation:', bob
jane = Person(name='Jane', age=29, gender='female')
print '\nField by name:', jane.name
print 'Field by name:', jane[0]
Type of Person: <type 'type'>
Representation: Person(name='Bob', age=30, gender='male')
Representation: Person(name='Bob', age=30, gender='male')
Field by name: Jane
Field by name: Jane
deque
Deque is double-ended queue
, which support add
and remove
operation on both sides. Ordinary stack
and queue
are degenrated form of deque.
And deque
is a sequence
. So operations like those on list are also valid.
d = collections.deque('abcdefg')
print 'Deque:', d
print 'Length:', len(d)
print 'Left end:', d[0]
print 'Right end:', d[-1]
d.remove('c')
print 'remove(c)', d
Deque: deque(['a', 'b', 'c', 'd', 'e', 'f', 'g'])
Length: 7
Left end: a
Right end: g
remove(c) deque(['a', 'b', 'd', 'e', 'f', 'g'])
Populating
Push element into deque.
import collections
# Add to the right
d = collections.deque()
d.extend('abcdefg') # append with elements from the iterable
print 'extend :', d
d.append('h')
print 'append :', d
# Add to the left
d = collections.deque()
d.extendleft('abcdefg')
print 'extendleft:', d
d.appendleft('h')
print 'appendleft:', d
extend : deque(['a', 'b', 'c', 'd', 'e', 'f', 'g'])
append : deque(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])
extendleft: deque(['g', 'f', 'e', 'd', 'c', 'b', 'a'])
appendleft: deque(['h', 'g', 'f', 'e', 'd', 'c', 'b', 'a'])
Consuming
Pop element from deque.
print 'From the right:'
d = collections.deque('abcdefg')
while True:
try:
print d.pop(),
except IndexError:
break
From the right:
g f e d c b a
print '\nFrom the left:'
d = collections.deque('abcdefg')
while True:
try:
print d.popleft(),
except IndexError:
break
From the left:
a b c d e f g
Counter
Counter needs no explanation.
print collections.Counter(['a', 'b', 'c', 'a', 'b', 'b'])
print collections.Counter({'a':2, 'b':3, 'c':1})
print collections.Counter(a=2, b=3, c=1)
Counter({'b': 3, 'a': 2, 'c': 1})
Counter({'b': 3, 'a': 2, 'c': 1})
Counter({'b': 3, 'a': 2, 'c': 1})
update()
c = collections.Counter()
print 'Initial :', c
c.update('abcdaab')
print 'Sequence:', c
c.update({'a':1,'d':5}) # increse not replace
print 'Dict :', c # add to a and d
Initial : Counter()
Sequence: Counter({'a': 3, 'b': 2, 'c': 1, 'd': 1})
Dict : Counter({'d': 6, 'a': 4, 'b': 2, 'c': 1})
Access
A simple API just like dictionary. For those keys that do not exist, return 0 instead of throwing an error.
c = collections.Counter('abcdaab')
for letter in 'abcde':
print '%s : %d' % (letter, c[letter])
a : 3
b : 2
c : 1
d : 1
e : 0
elements()
Return a iterator containing all the elements.
c = collections.Counter('China')
c['z'] = 0
print c
print list(c.elements())
Counter({'a': 1, 'C': 1, 'i': 1, 'h': 1, 'n': 1, 'z': 0})
['a', 'C', 'i', 'h', 'n']
most_common()
The most common n.
c = collections.Counter('abcdaab')
c.most_common(2)
[('a', 3), ('b', 2)]