TCP server internals
Let’s look on how the server application receives a request. When the client connects to the server, the server opens a new connection and starts listening for incoming data. Data usually arrives in chunks, and the server tries to find the end of the request by looking for delimiter characters or by a specified length, which might be indicated in the first few bytes.
A typical CPU can handle data transfer rates that are orders of magnitude faster than a network link is capable of sustaining. Thus, the server that is doing lots of I/O will spend much of its time blocked while network catches up. To not to block the other clients while waiting, the server has to handle the request reading in a concurrent manner.
A popular way to do so is to use a thread-per-client. But there are some problems with threads. Well, Python has no real multithreading support at all. Instead it has GIL (Global Interpreter Lock). GIL is necessary mainly because Python’s memory management is not thread-safe. It’s preventing multiple native threads from executing Python bytecodes at once.
On the one hand this makes the threaded programming with Python fairly simple: to add an item to a list or set a dictionary key, no locks are required. But on the other hand it leads to relatively big chunks of code that are executed sequentially, blocking each other for an undetermined amount of time.
The in-depth explanation of this problem is in this
video by David Beazley.
Disregarding the Python, there is much wider problem with threads. I was actually surprised with how many cons are in using them. Apparently, the cons are varying from being a bad design (as described here
) to more pragmatic ones such as consuming a fair amount of memory, since each thread needs to have its own stack. The stack size may vary on different OS’s.
On .NET it’s usually 1 Mb on 32 bit OS and 4 Mb on 64 bit OS. On Linux OS’s it might be up to 10 Mb per thread. Also, the context switches between many threads will degrade the performance significantly. Commonly, it’s not recommended to have more than 100 threads. Not surprisingly, it is also always difficult to write code that is thread safe. You have to care about such things as race condition, deadlocks, live-locks and starvation!
Fortunately there is a better way to handle concurrency. Python excels in a very specific area; asynchronous (aka non-blocking) network servers.
Back in the days before multi-threading was invented, asynchronous designs were the only available mechanism for managing more than one connection in a single process.
I’d like to illustrate this principle by example published in the linuxjournalsarticle
by Ken Kinder:
Have you ever been standing in the express lane of a grocery store, buying a single bottle of water, only to have the customer in front of you challenge the price of an item, causing you and everyone behind you to wait five minutes for the price to be verified?
Plenty of explanations of asynchronous programming exist, but I think the best way to understand its benefits is to wait in line with an idle cashier. If the cashier were asynchronous, he or she would put the person in front of you on hold and conduct your transaction while waiting for the price check. Unfortunately, cashiers are seldom asynchronous. In the world of software, however, event-driven servers make the best use of available resources, because there are no threads holding up valuable memory waiting for traffic on a socket. Following the grocery store metaphor, a threaded server solves the problem of long lines by adding more cashiers, while an asynchronous model lets each cashier help more than one customer at a time.The APM basic flow is visualized below:
The module is waiting for the event. Once there is any, it reacts (thus the name reactor) by calling the appropriate callback:
Python has introduced a high performance asynchronous server framework already since 1995 called Medusa.
That has turned to an archetype of nowadays well known Zope
. It has been built initially addressing C10K problem, which is a simple one; how to service 10,000 simultaneous network requests. I refer you to the C10Kwebsite
for enormously detailed technical information on this complex problem
It is sufficient to say that asynchronous architectures, with their much smaller memory usage, and lack of need for locking, synchronization and context-switching, are generally considered to be far more performant than the threaded architectures.
Here is a very impressive graph that compares Apache which is threaded with Nginx which is asynchronous.
So if you’ll ever need to handle hight traffic or just to have fun with trying a different programming thinking, you should consider to write an asynchronous application.