Preface ix
1 Understanding Performant Python 1 (16)
The Fundamental Computer System 1 (8)
Computing Units 2 (3)
Memory Units 5 (2)
Communications Layers 7 (2)
Putting the Fundamental Elements Together 9 (4)
Idealized Computing Versus the Python 10 (3)
Virtual Machine
So Why Use Python? 13 (4)
2 Profiling to Find Bottlenecks 17 (44)
Profiling Efficiently 18 (1)
Introducing the Julia Set 19 (4)
Calculating the Full Julia Set 23 (3)
Simple Approaches to Timing---print and a 26 (3)
Decorator
Simple Timing Using the Unix time Command 29 (2)
Using the cProfile Module 31 (5)
Using runsnakerun to Visualize cProfile 36 (1)
Output
Using line_profiler for Line-by-Line 37 (5)
Measurements
Using memory_profiler to Diagnose Memory 42 (6)
Usage
Inspecting Objects on the Heap with heapy 48 (2)
Using dowser for Live Graphing of 50 (2)
Instantiated Variables
Using the dis Module to Examine CPython 52 (2)
Bytecode
Different Approaches, Different Complexity 54 (2)
Unit Testing During Optimization to 56 (1)
Maintain Correctness
No-op @profile Decorator 57 (2)
Strategies to Profile Your Code 59 (1)
Successfully
Wrap-Up 60 (1)
3 Lists and Tuples 61 (12)
A More Efficient Search 64 (2)
Lists Versus Tuples 66 (6)
Lists as Dynamic Arrays 67 (3)
Tuples As Static Arrays 70 (2)
Wrap-Up 72 (1)
4 Dictionaries and Sets 73 (16)
How Do Dictionaries and Sets Work? 77 (8)
Inserting and Retrieving 77 (3)
Deletion 80 (1)
Resizing 81 (1)
Hash Functions and Entropy 81 (4)
Dictionaries and Namespaces 85 (3)
Wrap-Up 88 (1)
5 Iterators and Generators 89 (10)
Iterators for Infinite Series 92 (2)
Lazy Generator Evaluation 94 (4)
Wrap-Up 98 (1)
6 Matrix and Vector Computation 99 (36)
Introduction to the Problem 100(5)
Aren't Python Lists Good Enough? 105(4)
Problems with Allocating Too Much 106(3)
Memory Fragmentation 109(8)
Understanding perf 111(2)
Making Decisions with perf's Output 113(1)
Enter numpy 114(3)
Applying numpy to the Diffusion Problem 117(10)
Memory Allocations and In-Place 120(4)
Operations
Selective Optimizations: Finding What 124(3)
Needs to Be Fixed
numexpr: Making In-Place Operations 127(2)
Faster and Easier
A Cautionary Tale: Verify "Optimizations" 129(1)
(scipy)
Wrap-Up 130(5)
7 Compiling to C 135(46)
What Sort of Speed Gains Are Possible? 136(2)
JIT Versus AOT Compilers 138(1)
Why Does Type Information Help the Code 138(1)
Run Faster?
Using a C Compiler 139(1)
Reviewing the Julia Set Example 140(1)
Cython 140(10)
Compiling a Pure-Python Version Using 141(2)
Cython
Cython Annotations to Analyze a Block 143(2)
of Code
Adding Some Type Annotations 145(5)
Shed Skin 150(4)
Building an Extension Module 151(2)
The Cost of the Memory Copies 153(1)
Cython and numpy 154(3)
Parallelizing the Solution with OpenMP 155(2)
on One Machine
Numba 157(2)
Pythran 159(1)
PyPy 160(4)
Garbage Collection Differences 161(1)
Running PyPy and Installing Modules 162(2)
When to Use Each Technology 164(3)
Other Upcoming Projects 165(1)
A Note on Graphics Processing Units 166(1)
(GPUs)
A Wish for a Future Compiler Project 166(1)
Foreign Function Interfaces 167(12)
ctypes 167(3)
cffi 170(3)
f2py 173(2)
CPython Module 175(4)
Wrap-Up 179(2)
8 Concurrency 181(22)
Introduction to Asynchronous Programming 182(3)
Serial Crawler 185(2)
gevent 187(5)
tornado 192(4)
AsyncIO 196(2)
Database Example 198(3)
Wrap-Up 201(2)
9 The multiprocessing Module 203(60)
An Overview of the Multiprocessing Module 206(2)
Estimating Pi Using the Monte Carlo Method 208(2)
Estimating Pi Using Processes and Threads 210(11)
Using Python Objects 210(7)
Random Numbers in Parallel Systems 217(1)
Using numpy 218(3)
Finding Prime Numbers 221(11)
Queues of Work 227(5)
Verifying Primes Using Interprocess 232(16)
Communication
Serial Solution 236(1)
Naive Pool Solution 236(2)
A Less Naive Pool Solution 238(1)
Using Manager. Value as a Flag 239(2)
Using Redis as a Flag 241(2)
Using RawValue as a Flag 243(1)
Using mmap as a Flag 244(1)
Using mmap as a Flag Redux 245(3)
Sharing numpy Data with multiprocessing 248(6)
Synchronizing File and Variable Access 254(7)
File Locking 255(3)
Locking a Value 258(3)
Wrap-Up 261(2)
10 Clusters and Job Queues 263(24)
Benefits of Clustering 264(1)
Drawbacks of Clustering 265(3)
$462 Million Wall Street Loss Through 266(1)
Poor Cluster Upgrade Strategy
Skype's 24-Hour Global Outage 267(1)
Common Cluster Designs 268(1)
How to Start a Clustered Solution 268(1)
Ways to Avoid Pain When Using Clusters 269(1)
Three Clustering Solutions 270(7)
Using the Parallel Python Module for 271(2)
Simple Local Clusters
Using IPython Parallel to Support 273(4)
Research
NSQ for Robust Production Clustering 277(7)
Queues 277(1)
Pub/sub 278(2)
Distributed Prime Calculation 280(4)
Other Clustering Tools to Look At 284(1)
Wrap-Up 285(2)
11 Using Less RAM 287(38)
Objects for Primitives Are Expensive 288(4)
The Array Module Stores Many Primitive 289(3)
Objects Cheaply
Understanding the RAM Used in a Collection 292(2)
Bytes Versus Unicode 294(1)
Efficiently Storing Lots of Text in RAM 295(1)
Trying These Approaches on 8 (288)
Million Tokens 296(8)
Tips for Using Less RAM 304(1)
Probabilistic Data Structures 305(7)
Very Approximate Counting with a 1-byte 306(2)
Morris Counter
K-Minimum Values 308(4)
Bloom Filters 312(5)
LogLog Counter 317(4)
Real-World Example 321(4)
12 Lessons from the Field 325(20)
Adaptive Lab's Social Media Analytics 325(3)
(SoMA)
Python at Adaptive Lab 326(1)
SoMA's Design 326(1)
Our Development Methodology 327(1)
Maintaining SoMA 327(1)
Advice for Fellow Engineers 328(1)
Making Deep Learning Fly with 328(5)
RadimRehurek.com
The Sweet Spot 328(2)
Lessons in Optimizing 330(2)
Wrap-Up 332(1)
Large-Scale Productionized Machine 333(2)
Learning at Lyst.com
Pythons Place at Lyst 333(1)
Cluster Design 333(1)
Code Evolution in a Fast-Moving Start-Up 333(1)
Building the Recommendation Engine 334(1)
Reporting and Monitoring 334(1)
Some Advice 335(1)
Large-Scale Social Media Analysis at Smesh 335(4)
Python's Role at Smesh 335(1)
The Platform 336(1)
High Performance Real-Time String 336(2)
Matching
Reporting, Monitoring, Debugging, and 338(1)
Deployment
PyPy for Successful Web and Data 339(3)
Processing Systems
Prerequisites 339(1)
The Database 340(1)
The Web Application 340(1)
OCR and Translation 341(1)
Task Distribution and Workers 341(1)
Conclusion 341(1)
Task Queues at Lanyrd.com 342(3)
Python's Role at Lanyrd 342(1)
Making the Task Queue Performant 343(1)
Reporting, Monitoring, Debugging, and 343(1)
Deployment
Advice to a Fellow Developer 343(2)
Index 345