More data type from collections module



collections module support more data type beyond the built in python data type. LikeOrderedDict, defaultdict, namedtuple, deque, counter etc. Simple but powerful.

import collections


As it’s name says, dictionary is no longer random. Build in python dictionary do not remember the order of element inseration. It’s order is random when traversing, but OrderedDict is different.


print 'Regular dictionary:'
d = {}
d['a'] = 'A'
d['b'] = 'B'
d['c'] = 'C'

for key, value in d.items():
    print key, value
Regular dictionary:
a A
c C
b B
print 'OrderedDict:'
d = collections.OrderedDict()
d['a'] = 'A'
d['b'] = 'B'
d['c'] = 'C'

for key, value in d.items():
    print key, value
a A
b B
c C

Equal test

When comparing two dictionaries, OrderedDict also take order into consideration, not only elements.

print 'dict       :',
d1 = {}
d1['a'] = 'A'
d1['b'] = 'B'
d1['c'] = 'C'

d2 = {}
d2['b'] = 'B'
d2['a'] = 'A'
d2['c'] = 'C'

print d1 == d2
dict       : True
print 'OrderedDict:',
d1 = collections.OrderedDict()
d1['a'] = 'A'
d1['b'] = 'B'
d1['c'] = 'C'

d2 = collections.OrderedDict()
d2['b'] = 'B'
d2['a'] = 'A'
d2['c'] = 'C'

print d1 == d2

OrderedDict: False


For ordinary dictionary, when you access a key that do not exist, then python would throw an error. But with defaultdict, a default value can be defined in this case. It’s very useful especially when operation like agg is needed. defaultdict receive one param default_factory, which return a value or list(return [ ]) set(return set()) or int(return 0), examples are easier to understand.

defaultdict inherits from dict, and add a __missing__(key) method to deal with KeyError exception.

def default_factory():
    return 'This is default string value'
d = collections.defaultdict(default_factory)
print d['foo']
This is default string value

There is no key ‘foo’ in the dictionary, but we can access it and get a value.


Set default_factory as list, and we can easily group series of key-value pairs. By default, an empty list will be returned for non-exist key.

s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
d = collections.defaultdict(list)
for k, v in s:
    # simpler and faster than d.setdefault(k, []).append(v)
[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]


It is useful for counting the number of key occurence.

s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
d = collections.defaultdict(int)
for k, v in s:
    d[k] += 1
[('blue', 2), ('red', 1), ('yellow', 2)]
s = 'mississippi'
d = collections.defaultdict(int)
for k in s:
    d[k] += 1
[('i', 4), ('p', 2), ('s', 4), ('m', 1)]


Just like list, but with unique values.

s = [('red', 1), ('blue', 2), ('red', 3), ('blue', 4), ('red', 1), ('blue', 4)]
d = collections.defaultdict(set)
for k, v in s:
[('blue', {2, 4}), ('red', {1, 3})]


Default tuple is indexed with number, but with namedtuple, index with name is also possible. It is useful in cases that tuple is used in a place that is far from where it is created.

bob = ('Bob', 30, 'male')
print 'Representation:', bob

jane = ('Jane', 29, 'female')
print '\nField by index:', jane[0]

print '\nFields by index:'
for p in [ bob, jane ]:
    print '%s is a %d year old %s' % p
Representation: ('Bob', 30, 'male')

Field by index: Jane

Fields by index:
Bob is a 30 year old male
Jane is a 29 year old female
# define namedtuple
Person = collections.namedtuple('Person','name age gender')

print 'Type of Person:', type(Person)
bob = Person(name='Bob', age=30, gender='male')
print '\nRepresentation:', bob

bob = Person('Bob',30,'male') # also supported
print 'Representation:', bob

jane = Person(name='Jane', age=29, gender='female')
print '\nField by name:',
print 'Field by name:', jane[0]
Type of Person: <type 'type'>

Representation: Person(name='Bob', age=30, gender='male')
Representation: Person(name='Bob', age=30, gender='male')

Field by name: Jane
Field by name: Jane


Deque is double-ended queue, which support add and remove operation on both sides. Ordinary stack and queue are degenrated form of deque.

And deque is a sequence. So operations like those on list are also valid.

d = collections.deque('abcdefg')
print 'Deque:', d
print 'Length:', len(d)
print 'Left end:', d[0]
print 'Right end:', d[-1]

print 'remove(c)', d
Deque: deque(['a', 'b', 'c', 'd', 'e', 'f', 'g'])
Length: 7
Left end: a
Right end: g
remove(c) deque(['a', 'b', 'd', 'e', 'f', 'g'])


Push element into deque.

import collections

# Add to the right
d = collections.deque()
d.extend('abcdefg') # append with elements from the iterable
print 'extend    :', d
print 'append    :', d

# Add to the left
d = collections.deque()
print 'extendleft:', d
print 'appendleft:', d
extend    : deque(['a', 'b', 'c', 'd', 'e', 'f', 'g'])
append    : deque(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])
extendleft: deque(['g', 'f', 'e', 'd', 'c', 'b', 'a'])
appendleft: deque(['h', 'g', 'f', 'e', 'd', 'c', 'b', 'a'])


Pop element from deque.

print 'From the right:'
d = collections.deque('abcdefg')
while True:
        print d.pop(),
    except IndexError:
From the right:
g f e d c b a
print '\nFrom the left:'
d = collections.deque('abcdefg')
while True:
        print d.popleft(),
    except IndexError:
From the left:
a b c d e f g


Counter needs no explanation.

print collections.Counter(['a', 'b', 'c', 'a', 'b', 'b'])
print collections.Counter({'a':2, 'b':3, 'c':1})
print collections.Counter(a=2, b=3, c=1)
Counter({'b': 3, 'a': 2, 'c': 1})
Counter({'b': 3, 'a': 2, 'c': 1})
Counter({'b': 3, 'a': 2, 'c': 1})


c = collections.Counter()
print 'Initial :', c

print 'Sequence:', c

c.update({'a':1,'d':5}) # increse not replace
print 'Dict    :', c # add to a and d
Initial : Counter()
Sequence: Counter({'a': 3, 'b': 2, 'c': 1, 'd': 1})
Dict    : Counter({'d': 6, 'a': 4, 'b': 2, 'c': 1})


A simple API just like dictionary. For those keys that do not exist, return 0 instead of throwing an error.

c = collections.Counter('abcdaab')
for letter in 'abcde':
    print '%s : %d' % (letter, c[letter])
a : 3
b : 2
c : 1
d : 1
e : 0


Return a iterator containing all the elements.

c = collections.Counter('China')
c['z'] = 0
print c
print list(c.elements())
Counter({'a': 1, 'C': 1, 'i': 1, 'h': 1, 'n': 1, 'z': 0})
['a', 'C', 'i', 'h', 'n']


The most common n.

c = collections.Counter('abcdaab')
[('a', 3), ('b', 2)]

