Public speaking course notes Read "Dynamo, Amazon’s Highly Available Key-value Store" Read "Bigtable, A Distributed Storage System for Structured Data" Read "Streaming Systems" 3, Watermarks Read "Streaming Systems" 1&2, Streaming 101 Read "F1, a distributed SQL database that scales" Read "Zanzibar, Google’s Consistent, Global Authorization System" Read "Spanner, Google's Globally-Distributed Database" Read "Designing Data-intensive applications" 12, The Future of Data Systems IOS development with Swift Read "Designing Data-intensive applications" 10&11, Batch and Stream Processing Read "Designing Data-intensive applications" 9, Consistency and Consensus Read "Designing Data-intensive applications" 8, Distributed System Troubles Read "Designing Data-intensive applications" 7, Transactions Read "Designing Data-intensive applications" 6, Partitioning Read "Designing Data-intensive applications" 5, Replication Read "Designing Data-intensive applications" 3&4, Storage, Retrieval, Encoding Read "Designing Data-intensive applications" 1&2, Foundation of Data Systems Three cases of binary search TAMU Operating System 2 Memory Management TAMU Operating System 1 Introduction Overview in cloud computing 2 TAMU Operating System 7 Virtualization TAMU Operating System 6 File System TAMU Operating System 5 I/O and Disk Management TAMU Operating System 4 Synchronization TAMU Operating System 3 Concurrency and Threading TAMU Computer Networks 5 Data Link Layer TAMU Computer Networks 4 Network Layer TAMU Computer Networks 3 Transport Layer TAMU Computer Networks 2 Application Layer TAMU Computer Networks 1 Introduction Overview in distributed systems and cloud computing 1 A well-optimized Union-Find implementation, in Java A heap implementation supporting deletion TAMU Advanced Algorithms 3, Maximum Bandwidth Path (Dijkstra, MST, Linear) TAMU Advanced Algorithms 2, B+ tree and Segment Intersection TAMU Advanced Algorithms 1, BST, 2-3 Tree and Heap TAMU AI, Searching problems Factorization Machine and Field-aware Factorization Machine for CTR prediction TAMU Neural Network 10 Information-Theoretic Models TAMU Neural Network 9 Principal Component Analysis TAMU Neural Network 8 Neurodynamics TAMU Neural Network 7 Self-Organizing Maps TAMU Neural Network 6 Deep Learning Overview TAMU Neural Network 5 Radial-Basis Function Networks TAMU Neural Network 4 Multi-Layer Perceptrons TAMU Neural Network 3 Single-Layer Perceptrons Princeton Algorithms P1W6 Hash Tables & Symbol Table Applications Stanford ML 11 Application Example Photo OCR Stanford ML 10 Large Scale Machine Learning Stanford ML 9 Anomaly Detection and Recommender Systems Stanford ML 8 Clustering & Principal Component Analysis Princeton Algorithms P1W5 Balanced Search Trees TAMU Neural Network 2 Learning Processes TAMU Neural Network 1 Introduction Stanford ML 7 Support Vector Machine Stanford ML 6 Evaluate Algorithms Princeton Algorithms P1W4 Priority Queues and Symbol Tables Stanford ML 5 Neural Networks Learning Princeton Algorithms P1W3 Mergesort and Quicksort Stanford ML 4 Neural Networks Basics Princeton Algorithms P1W2 Stack and Queue, Basic Sorts Stanford ML 3 Classification Problems Stanford ML 2 Multivariate Regression and Normal Equation Princeton Algorithms P1W1 Union and Find Stanford ML 1 Introduction and Parameter Learning

More data type from collections module

2016-08-19

collections

collections module support more data type beyond the built in python data type. LikeOrderedDict, defaultdict, namedtuple, deque, counter etc. Simple but powerful.

import collections

OrderedDict

As it’s name says, dictionary is no longer random. Build in python dictionary do not remember the order of element inseration. It’s order is random when traversing, but OrderedDict is different.

Traversing

print 'Regular dictionary:'
d = {}
d['a'] = 'A'
d['b'] = 'B'
d['c'] = 'C'

for key, value in d.items():
    print key, value
Regular dictionary:
a A
c C
b B
print 'OrderedDict:'
d = collections.OrderedDict()
d['a'] = 'A'
d['b'] = 'B'
d['c'] = 'C'

for key, value in d.items():
    print key, value
OrderedDict:
a A
b B
c C

Equal test

When comparing two dictionaries, OrderedDict also take order into consideration, not only elements.

print 'dict       :',
d1 = {}
d1['a'] = 'A'
d1['b'] = 'B'
d1['c'] = 'C'

d2 = {}
d2['b'] = 'B'
d2['a'] = 'A'
d2['c'] = 'C'

print d1 == d2
dict       : True
print 'OrderedDict:',
d1 = collections.OrderedDict()
d1['a'] = 'A'
d1['b'] = 'B'
d1['c'] = 'C'

d2 = collections.OrderedDict()
d2['b'] = 'B'
d2['a'] = 'A'
d2['c'] = 'C'

print d1 == d2

OrderedDict: False

defaultdict

For ordinary dictionary, when you access a key that do not exist, then python would throw an error. But with defaultdict, a default value can be defined in this case. It’s very useful especially when operation like agg is needed. defaultdict receive one param default_factory, which return a value or list(return [ ]) set(return set()) or int(return 0), examples are easier to understand.

defaultdict inherits from dict, and add a __missing__(key) method to deal with KeyError exception.

def default_factory():
    return 'This is default string value'
d = collections.defaultdict(default_factory)
print d['foo']
This is default string value

There is no key ‘foo’ in the dictionary, but we can access it and get a value.

list

Set default_factory as list, and we can easily group series of key-value pairs. By default, an empty list will be returned for non-exist key.

s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
d = collections.defaultdict(list)
for k, v in s:
    d[k].append(v)
    # simpler and faster than d.setdefault(k, []).append(v)
d.items()
[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]

int

It is useful for counting the number of key occurence.

s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
d = collections.defaultdict(int)
for k, v in s:
    d[k] += 1
d.items()
[('blue', 2), ('red', 1), ('yellow', 2)]
s = 'mississippi'
d = collections.defaultdict(int)
for k in s:
    d[k] += 1
d.items()
[('i', 4), ('p', 2), ('s', 4), ('m', 1)]

set

Just like list, but with unique values.

s = [('red', 1), ('blue', 2), ('red', 3), ('blue', 4), ('red', 1), ('blue', 4)]
d = collections.defaultdict(set)
for k, v in s:
    d[k].add(v)
d.items()
[('blue', {2, 4}), ('red', {1, 3})]

namedtuple

Default tuple is indexed with number, but with namedtuple, index with name is also possible. It is useful in cases that tuple is used in a place that is far from where it is created.

bob = ('Bob', 30, 'male')
print 'Representation:', bob

jane = ('Jane', 29, 'female')
print '\nField by index:', jane[0]

print '\nFields by index:'
for p in [ bob, jane ]:
    print '%s is a %d year old %s' % p
Representation: ('Bob', 30, 'male')

Field by index: Jane

Fields by index:
Bob is a 30 year old male
Jane is a 29 year old female
# define namedtuple
Person = collections.namedtuple('Person','name age gender')

print 'Type of Person:', type(Person)
bob = Person(name='Bob', age=30, gender='male')
print '\nRepresentation:', bob

bob = Person('Bob',30,'male') # also supported
print 'Representation:', bob

jane = Person(name='Jane', age=29, gender='female')
print '\nField by name:', jane.name
print 'Field by name:', jane[0]
Type of Person: <type 'type'>

Representation: Person(name='Bob', age=30, gender='male')
Representation: Person(name='Bob', age=30, gender='male')

Field by name: Jane
Field by name: Jane

deque

Deque is double-ended queue, which support add and remove operation on both sides. Ordinary stack and queue are degenrated form of deque.

And deque is a sequence. So operations like those on list are also valid.

d = collections.deque('abcdefg')
print 'Deque:', d
print 'Length:', len(d)
print 'Left end:', d[0]
print 'Right end:', d[-1]

d.remove('c')
print 'remove(c)', d
Deque: deque(['a', 'b', 'c', 'd', 'e', 'f', 'g'])
Length: 7
Left end: a
Right end: g
remove(c) deque(['a', 'b', 'd', 'e', 'f', 'g'])

Populating

Push element into deque.

import collections

# Add to the right
d = collections.deque()
d.extend('abcdefg') # append with elements from the iterable
print 'extend    :', d
d.append('h')
print 'append    :', d

# Add to the left
d = collections.deque()
d.extendleft('abcdefg')
print 'extendleft:', d
d.appendleft('h')
print 'appendleft:', d
extend    : deque(['a', 'b', 'c', 'd', 'e', 'f', 'g'])
append    : deque(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])
extendleft: deque(['g', 'f', 'e', 'd', 'c', 'b', 'a'])
appendleft: deque(['h', 'g', 'f', 'e', 'd', 'c', 'b', 'a'])

Consuming

Pop element from deque.

print 'From the right:'
d = collections.deque('abcdefg')
while True:
    try:
        print d.pop(),
    except IndexError:
        break
From the right:
g f e d c b a
print '\nFrom the left:'
d = collections.deque('abcdefg')
while True:
    try:
        print d.popleft(),
    except IndexError:
        break
From the left:
a b c d e f g

Counter

Counter needs no explanation.

print collections.Counter(['a', 'b', 'c', 'a', 'b', 'b'])
print collections.Counter({'a':2, 'b':3, 'c':1})
print collections.Counter(a=2, b=3, c=1)
Counter({'b': 3, 'a': 2, 'c': 1})
Counter({'b': 3, 'a': 2, 'c': 1})
Counter({'b': 3, 'a': 2, 'c': 1})

update()

c = collections.Counter()
print 'Initial :', c

c.update('abcdaab')
print 'Sequence:', c

c.update({'a':1,'d':5}) # increse not replace
print 'Dict    :', c # add to a and d
Initial : Counter()
Sequence: Counter({'a': 3, 'b': 2, 'c': 1, 'd': 1})
Dict    : Counter({'d': 6, 'a': 4, 'b': 2, 'c': 1})

Access

A simple API just like dictionary. For those keys that do not exist, return 0 instead of throwing an error.

c = collections.Counter('abcdaab')
for letter in 'abcde':
    print '%s : %d' % (letter, c[letter])
a : 3
b : 2
c : 1
d : 1
e : 0

elements()

Return a iterator containing all the elements.

c = collections.Counter('China')
c['z'] = 0
print c
print list(c.elements())
Counter({'a': 1, 'C': 1, 'i': 1, 'h': 1, 'n': 1, 'z': 0})
['a', 'C', 'i', 'h', 'n']

most_common()

The most common n.

c = collections.Counter('abcdaab')
c.most_common(2)
[('a', 3), ('b', 2)]

Creative Commons License
Melon blog is created by melonskin. This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
© 2016-2024. All rights reserved by melonskin. Powered by Jekyll.