Reinventing Quick Sort by improving Stupid Sort.

March 14, 2018

Stupid sort is the sort that just randomly shuffles the array until it eventually becomes sorted.

In spite of the fact that stupid sort is not so popular in production environment, it may be mentioned as an opening question at programming interview of any level. The discussion about optimizing this kind of search may be very interesting and useful for better understanding some sorting techniques.

In a nutshell stupid sort may be written like this:

The average running time of this algorithm is terribly long. For each permutation (out of n!) we need to verify the order of all the elements in the array (n elements) which gives us O (n*n!). On my Hexa-core Xiaomi Redmi 3 phone this process is very slow even for n=10 ending up with more than 36 million operations. In the worst case, the running time of the algorithm is unbounded, for the same reason that a tossed coin might turn up heads any number of times in a row.

The best case occurs if the list as given is already sorted; in this case the expected time complexity is O(n)

For any collection of fixed size, the expected running time of the algorithm is finite for much the same reason that the infinite monkey theorem holds: there is non-zero probability of getting  right permutation, so given an unbounded number of tries it will almost surely eventually be chosen.

Let us demonstrate the way we can increase the average running time by adding a couple of simple rules. In our example shuffle each time swaps 2 random elements. What if we will not swap those pairs that already placed in a right order (left element smaller than right)? This will immediately save us a lot of swaps and also significantly decrease the running time.

Let us write down this “directional shuffle”:

Now, after adding these new rules, my phone was able to sort 100 elements in less than a second.

Although the running time has decreased, it still increases very fast, when the number of elements in the array adds up. For example, to sort 1000 elements takes about 100 seconds. We increased the array size by x10 and got the x1000 slow down. This allows us to estimate our algorithm by O(n^3). It turns out that if we only could split our 1000 elements array to 10 sub-arrays by 100 elements and then sort each other separately, this would take us less than 10*(1 sec) = 10 seconds (instead of 1000). The only thing we should figure out is how to split the array in such way that the merge back would be trivial. Apparently we can change our shuffling function a little bit so that after shuffling the value of elements in the left part of the array will be smaller than those of the elements in the right part. Shuffling written in this way may be called many times on the same part of the array until it eventually becomes sorted (stupid-sort technique). Then, after all parts of the array are sorted, the entire array becomes sorted (trivially).

Let us modify our “shuffle” to swap elements with respect to some randomly chosen element of the array (pivot):

The result of one run of this function is that original array was divided into 2 parts: left and right, separated by pivot. Each element on the left is smaller than pivot, and each element on the right is greater or equal than the pivot. Please notice that the function has O (n) time complexity and works in-place.

Now our stupid_sort can be written as follows:

We can actually achieve the same functionality, using a recursion:

Pay attention, that is_sorted(arr) call replaced with if l >= r: since once the size of partitions is 2 the partitions become sorted. Then the original array which consists of sorted partitions becomes also sorted.

Here is the full code resuming all the above:

Written in such way, shuffling function has a special name – partitioning and the algorithm called Quick Sort. Its complexity is bound by number of

partitions, multiplied by complexity of shuffling each partition which is log(n)
* n

Thus, we have just demonstrated how by means of gradually improving the
most basic and simple function, our stupid sort has been transformed into
one of the most efficient sorting algorithms known today.

Featured

SSDB as a cheaper alternative to Redis

January 16, 2017

Hello again, today I would like to talk about SSDB – high-performance NoSQL database as a cheaper drop in replacement for Redis.

Redis in most cases is the best solution for managing a high-througput and low-latency data access. It’s also open source and free to use. But using Redis might be expensive in terms of higher costs for hardware resources utilization. For example, I was recently required to store a lot of keywords, which number per task was virtually unlimited. Redis was a first option coming to my mind. I’ve started to look around for existing services which hosts Redis, and found that the prices were much over expectations. At redislabs.com to get the database storage of 15 GB would cost about USD1000 per month. This is the cheapest I could find so far. Thus, in case, one would need to store large amounts of data, he would have to pay quite a fortune to do this with Redis. Such prices explained by the fact that Redis is in-memory database and memory is an expensive resource. So I started to think about alternatives and search for Redis-like disk-based database which will reduce the costs without compromising the speed.
Ask and it will be given to you.
Eventually, as I was doing my search, I came across SSDB – high performance NoSQL database. It supports most of Redis protocol and can be used as a drop in replacement for Redis. It even can be accessed with Redis clients. There is a list of supported Redis commands and the way they map to SSDB. While being disk based, SSDB can store much more data, than Redis on the same machine, which is much cheaper.

Every programmer should know that 1 disk seek is by 100.000 times slower than an access to RAM. Therefore, how possibly disk-based database can be as fast as in-memory one?
The answer is – a little bit of magic and voila … LevelDB design, which implements SSTable (Sorted Strings Table, hence the name for SSDB). Credit to Google.
Describing briefly, the process works as follows:  the database queries, which change the data (writes, updates and deletes) are stored as a deltas in memory layer which is fast. Once the specified size of such layer is reached, all those deltas dumped to disk. Being sorted, the data is written to disk sequentially, which is only by 3 times slower than writing to RAM (not x100.000 times slower as I mentioned before). Such dumps occur once a while thus giving us the amortized performance comparable to in-memory’s one. Reads are done by searching in-memory layer first and then, if not found – accessing the disk (worst case).
The worst case scenario, performing many random seeks to disk, in theory would be significantly slower. However, let’s keep in mind that in such a case, when the data size is bigger than memory available, any in-memory database would be already overflown and effectively dead. Thus the above comparison is still in favor of SSDB.

I made a quick benchmark with redis-benchmark tool to Redis server:

and the same procedure to SSDB server, running on the same machine, with same redis-benchmark tool:

We can see that SSDB’s set is only 20% slower, but get is slightly faster then get of Redis.
Concluding the above, we saw that SSDB in most cases is as fast as Redis, and worst case for SSDB is not even supported by Redis. There is an attempt to overcome this problem with hybrid solution such as Redis + MongoDB or Redis + PostgreSQL, where Redis provides a cache-like layer in front of disk-based database. Such approach seems to be not trivial at all, when it comes to scaling and synchronizing the different databases between themselves.
SSDB provides solution to all of these cases at once. No more hybrid solutions needed.
Many companies have already recognized advantages of SSDB. Thus Chinese search giant Baidu uses SSDB for their most important service – Web Search. QIHOO 360 which operates hundreds of millions users, migrated its Redis instances to SSDB, and now it uses less servers for storage.
When I realized there is no simple way to use SSDB from Heroku platform, I created ssdbhub.com which  provides SSDB as Heroku add-on.

WE ARE LOOKING FOR ALPHA TESTERS FOR THIS ADD-ON

Our first users will be given away 5 GB Free Plan for unlimited period of time.
To claim your free plan, drop your email at ssdbhub.com

To get started, refer to documentation. There is a comprehensive guide on how to create the SSDB add-on on Heroku and how to access it, using your favorite programming language.

It’s extremely important to us to get your feedback, If you have any technical issues, fill free to open the public discussion on our issue tracker or send us a direct message to support@ssdbhub.com, whichever works best for you.

A journey of a thousand miles begins with a single step.

Mission Impossible! Or how to beat a chess computer at impossible level.

October 26, 2015

Hello again,

check out the video I have made of playing chess on today’s one of the most popular chess sites. As you can see, I was able to outplay a chess computer at “IMPOSSIBLE” level as its name suggests.

On the web site it has been rated 2300 ELO that compares to a strong master level. So my achievement is quite impressive, considering I have never been studying playing chess professionally, and my rating is about average.

I’m not a professional chess player …but I’m a programmer.

And here is the catch. As you have probably guessed, it is my chess bot that is playing on the video. I was surprised how easy it is to actually build one. The bot opens a browser, logs in to the playing area, reads the moves from the moves list, detects game times and makes its own moves by actually clicking and dragging the pieces over the board. I used Selenium for browser automation and Stockfish – one of the world’ strongest, open sourced chess engine – for “thinking”. All in all, it took less than a hundred lines of Python code and a weekend of hacking around the web sites DOM tree.

I’m not going to publish the source code, not to encourage the usage of such piece of technology against the real players. I think it’s not fair and not fun.

That was a challenge for me as a programmer and it is successfully accomplished. I was able to create an automated player that is much stronger than myself and stronger than majority of human players. This brings me to an interesting point…

It does not matter what you are going to learn, learn Programming!

Background music: Izzat Man – Robots are also sleeping

Lua as a Python’s secret weapon.

May 28, 2015

There are Lies, Damn Lies, and Benchmarks

Pure Python

Hello again, today I would like to share some remarkable benchmark findings that demonstrate the way Python may be boosted by embedding Lua into Python code. The process has started with simple task my friend asked me to fulfil in order to compare Python to other languages. So, below is the benchmark we are going to test:

fig 1.0

We have created 2 arrays of 5M random integers each and then we have compared them. If elements with corresponding indexes are different (merely they are) we would make “sum and update” action, which gives us the following result:

Pure Python init 7.01951003075
Pure Python sum 0.525348901749

Thus, Python requires about 7 seconds to initialize the arrays and another 0.52 seconds to perform “sum and update” action.
Are we capable to get more out of Python? Yes, for sure, we are!  Let’s see below:

NumPy

Apparently NumPy is capable to sum up two arrays in very efficient way. It is also superior in array initialization.

Consider the following code:

The output:

Numpy init 0.211035966873
Numpy sum 0.0101628303528

We can see that NumPY requires just 0.2 seconds for arrays initialization versus 7 seconds of Python, which is 35x faster of pure Python time. Same thing is with the “sum and update” action. NumPy used 0.01 second versus 0.5, which is 50x time. Pretty impressive numbers!

PyPy

Running the pure Python code (fig 1.0) with PyPy gives us the same results as NumPy:

Regular init 0.203986883163
Regular sum 0.0113749504089

C

Now let’s see what we are able to get out of C. My friend claimed he could write this function with C and it would take him no more than 5 minutes. Well, he actually did it within 5 minutes and then spent another hour trying to figure out why the function was not working. Probably, every experienced C programmer would immediately spot the problem. Namely, in C you cannot just declare the array of size 5000000 because it is too big to be placed in app stack. What you should do, is to place it in a heap by malloc function.. While my friend was straggling with C code, I wanted to figure out how to speed up Python code without writing custom module in C. And I have found this simple solution:

Lua

Lua is an amazing language! It’s available as open source, it’s easy to read and understand, which makes it easy to learn. You can find a detailed and authoritative introduction to all aspects of Lua programming written by Lua’s chief architect Roberto Ierusalimschy here. In spite being simple, it is, nevertheless, a very powerful tool and a solid piece of programming art. To say the least, Lua is one of the fastest dynamic languages today. So please do not underestimate it. An important feature for us is that Lua has a very small memory footprint – less than 200K for the whole interpreter. This allows us to seamlessly embed it into any environment in order to facilitate and speed up the process.

Here is our benchmark written with Lua but yet wrapped with Python:

Output:

LUA init: 0.080163
LUA sum: 0.007911

The result is amazing! We can see that Lua is faster than PyPy and faster than NumPy by 1.5-3x, and faster than pure Python by 50-100x!

  1. Python   1x
  2. NumPy  50x
  3. PyPy     50x
  4. Lua       100x
  5. C          100x

As you can see, it was easy to embed Lua in Python and the results are outstanding! Thus, we can say that Lua is definitely of a great help when you need to speed up some critical parts of your Python code. Many commercial products today use Lua. Just to name a few: Adobe Photoshop Lightroom, World of Warcraft, FarCry, SimCity 4 and others.

Hope you enjoyed this post and it will be useful to you.

Yahoo ChessMaster R.I.P

February 15, 2015

Hello again,

this is my next publication in which I would like to share one of my first experiences as a programmer.

I strongly believe the best method of learning something is simply to start doing it. Yep, it’s all about practice.
So, there was no better way for me to learn a new programming language than just start writing a project using this language. One of my first self-educational projects was YahooChessMaster – a chess bot that can play chess at Yahoo game portal.

yahoochessmasterv10pk6

 

I think, it was one of the first Yahoo bots at that time. I have written it using VC++ v.6.0 back in 2005.
YCM was identifying the open browser window with a chessboard inside and it was scanning the board visually pixel by pixel. Then YCM was passing the pieces position from the board to a local chess engine, using UCI protocol. The engine was making a move and returning its algebraic notation to YCM.
After that YCM was moving the corresponding piece on the board by transmitting mouse events to browser window. You may say: “Scanning the board is an overhead.  Wouldn’t be much easier to emulate the network interactions: request/response?”

No it wouldn’t, turns out, to emulate the networking may not be that simple at all. First, the communication mays need to be ssl encrypted; then, it has to use some sort of custom protocol that would require tons of reverse engineering efforts.
In my method, I had managed to speed up the board scanning process by reducing the amount of pixels that need to be scanned. Thanks to the reach color pallet of the chess pieces representation, it was possible to identify a chess piece by getting only one single pixel from a single square. Thus, to read the entire game position was a matter of reading 64 single pixels only. YCM was able to do it many times per second.
Eventually, after all the Internet lags, plus pixels reading stuff, plus IPC interactions with engine and mouse events passing back to the browser, YCM was able to shoot 1 move per second, playing 1 minute blitz game. In addition to incredibly short move time, YCM playing strength was amazingly high. Behind the curtains YCM was using one of the strongest chess engines at this time called Spike – a winner of a first Computer World Championship in 2005. And Spike was free of charge.
Very quickly YCM became 2400 rated. It literally never lost any game. I have realized that I’ve created something really good and tried to sell it… During playtime YCM was posting spam-like messages in the rooms chat that were referring individuals who might be interested to my website where a limited demo version was available for free and fully-functional-commercial one for only $9.
Only one YCM copy was sold and later refunded. It turned out, YCM did not have an ability to comply with too many different versions of Windows at that time. Just to name few of them: XP Home, XP Professional, ME, 2000, 98, 95… I just had no time to support all of this “commercial” stuff, and eventually I abandoned the project. The website with the demo version was shut down. All this happened in 2005
Today there are couple of chess bots for Yahoo that are dramatically similar to my YCM. One of them is Chess Buddy.

Here is how it looks like:

ssChsY2

And here is the screen shot from YCM:

yahoochessmasterv10bmp2ui3

So there is a chance that my ideas were adopted and are living till now. And I’m good with that.

About websockets of 90’s and creativity within..

October 1, 2014

Eventually I was involved in a talk about different ways of establishing a real time connection between browser and server. One of my colleagues just said: “I would use a GIF for that”, and I was like “whaat?”, and then he showed me this library: gifsockets. This is an implementation of Graphical Interchange Format protocol, known as GIF. It’s supposed to represent animation in browser, that consists of set of frames that are consistently changing each other. When downloading GIF animation, browser is doing this frame by frame and is holding the connection till receiving a special character from the server. Potentially, when there is no such special character sent, the browser might hold the connection opened forever. It’s also possible to convert plain text to a special format (RGB frame) and send it via a regular GIF protocol to a browser.

Back in 90’s, when there was no websockets, people were using this feature for establishing a real time communication between browser and server. Although this is clearly a big piece of hack – when something is used in a way it’s not supposed to – I think this is an amazing example of creativity in programming..

Extracting textual time-based content from blog page using LCA techique on a DOM tree.

January 20, 2014

Today there is increasing interest in scraping the latest data from internet. Especially textual data. There is a lot of content providing sites, such as blogs, news, forums, etc. This content is time-based (periodically updated during the time). Extracting time-based content from millions of sites is not a trivial task. The main difficulty here is that we don’t know beforehand what is the format of the HTML page that we are going to scrape.

In this post I will describe the method for extracting the time-based textual content from the blog pages in the automatic manner.

Let us look on how posts are organized in the vast majority of blogs:

There are posts that consist of repeating DOM elements such as divs, spans and so on. The common feature in them is that every post contains time stamp. The time stamp might be in the beginning of the post or in the end of it, but almost always it’s a separate DOM element.

So it might be not too hard to detect the dates, using some sort of date parsing library, such as dateutil.

The dates detection process would go like this:

Now when we have the dates, we assume that each of them is the part of the post that we need to collect, but we don’t know where does the post starts and where does it end?

Fortunately there is an efficient method to identify the posts boundaries by finding the Lowest Common Ancestor (LCA) for given dates elements in a DOM tree. Then having the dates LCA’s we can be pretty sure that its children provide the required entry points to the posts itself. Below is the picture that visualizes the idea:

Screenshot from 2014-01-19 18:31:38

Another, more complex case such as multiple post threads in the page:

Screenshot from 2014-01-19 17:57:45

All these cases are supported by the following code:

Where graph is the pattern.graph.Graph instance representing the HTML DOM tree. We are using here shortest_path method to find the shortest paths between the dates elements. The intersection points of these shortest paths will provide the LCA  candidates.

Then we might want to iterate through LCA’s children and extract text from them. This may be accomplished with something similar to the following actions:

And, Yuppee!! We have got the posts!

This was a high level overview of the method. Obviously there is more optimization needed to make use of it in the production environment. Hopefully this will give some insights for those who’s interested with data mining and especially in time-based content extraction.

Thanks to my sister Natalia Vishnevsky for editing this and for the moral support.

Python as an optimal solution for today’s network application programming challenges.

November 3, 2013

Recently I have made a brief exploration of local job market and have found a simple fact – Python is greatly misunderstood and as a result – extremely underestimated.

Having an experience of being employed by different software developing companies, I have heard many times from technical people that Python is slow and doesn’t have a real threading mechanism and due to the lack of static types it’s error-prone; that some of the “basic” OOP stuff is missing there and this is more like a toy language with no real use in production environment… So I felt a deep need to write an introduction to Python for potential users, employers and to all who may be interested. It might be of great use for business owners and start-up runners as well as for technical people.

Introduction.

1. “Python is a programming language that allows you to work more quickly and integrate your systems more effectively. You can learn to use Python and see almost immediate gains in productivity and lower maintenance costs.” – From my personal experience the development on Python is about 3 times faster then on Java.

2. “Python runs on Windows, Linux/Unix, Mac OS X, and has been ported to the Java and .NET virtual machines.” – and I would add also all of them.

3. “Python is free to use, even for commercial products, because of its OSI-approved open source license.”

The world is changing. Reactive Manifesto.

First of all I would like you to introduce a reactive application manifesto.

It has been recently created to name the changes that are remarkably  taking place in nowadays requirements to a network application. Reactive application manifesto lists the rules that application has to comply with, namely – scalability, responsiveness and resilience. It clearly says that scalability and responsiveness are all about asynchronous programming models (APM).

Responsiveness.

A year ago I’ve written a post about asynchronous programming with python, where I explained the advantages of APM over threading model. There I claimed that Python is actually excels APM in respect to simplicity of code writing. Having one of the largest official repository of open source code (36195 packages as of time of writing the current post), Python can offer many different libraries that make it dead-easy to write an APM based code. While the most advanced of them is Twisted – event driven networking engine.

Gregory Begelman describes here how we built APM crawlers system on top of scrapy that is being able to scrape simultaneously 2000 domains on one machine with 8 cores and 32G of RAM. In this post I elaborate it even further.

Scalability and Resilience.

Today the major players on the market such as Google, Facebook, Yahoo, Youtube and many others use Python as part of their core infrastructure.

Such well-known and commonly used computing systems as  Amazone Web ServicesGoogle AppEngineAppache Hadoop and Nokia Disco have been integrated with Python as well.

Thereby, Python is perfect for reactive application model that is described above. Considering its simplicity in use and its open source nature with a mature community and huge repository, I conclude that Python is an optimal, best and the most efficient choice for today’s network application programming needs and challenges.

Tips on optimizing scrapy for a high performance

November 2, 2013

Running multiply crawlers in a single process.

When crawling many web pages it’s important for an application to get an advantage of APM. Scrapy is a Python asynchronous crawling framework, that with small changes is perfectly suites this need.

Scrapy has a Crawler component that includes request scheduler as well as visited urls queue, together with all the configuration parameters related to how the crawling process should be performed. Thus if you want to crawl multiple domains simultaneously the best isolation level would be to create a separate Crawler for each domain. This will allow you to have a custom configuration context per domain.

The conventional use of scrapy is its Crawler-per-process model. However if you need to crawl 2000 domains simultaneously  on one machine, it would be very inefficient  having 2000 processes on there. So our purpose is to run multiple Crawlers within just one scrapy process.

Gregory Begelman describes in his post how we solved this problem, and how we managed to run 300 concurrent domains from only one process that took just one processor core and about only 3G of RAM.

Bloom as a duplicate filter for visited urls.

The default duplicate filter, that is used in scrapy for filtering visited urls, uses a list of url fingerprints – basically sha1 hashes in length of 40 characters that is 77 bytes long in Python 2.7. Lets say you have to scrape a site with 2M of pages, then your duplicates filter list might grow up to 2M * 77b = 154Mb per one Crawler. In order to be able to scrape 300 of such domains simultaneously, you will need 300 * 154Mb = 42G of memory. Fortunately there is another way – Bloom Filter. It’s a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. When we define it to store 2M of items with probability of false positive matches equal to 0.00001% we are getting the duplicate filter that is good enough for only 11Mb of memory. This enables us to crawl 300 domains using only 3G of memory.

Below is the Bloom filter for scrapy implemented with pybloom

To enable it in your project add to the projects settings.py the following line:

 

 

Why you should consider asynchronous programming model (APM) when writing web server in Python.

October 31, 2013

This is a repost of my article from PicScouts engineering blog. Thanks guys for keeping it up! The original version may be found here:

http://engineering.picscout.com/2012/07/why-you-should-consider-asynchronous.html

In our days there is an increasing interest towards asynchronous technologies. This is particularly as a result of a huge success of Nginx and NodeJS. But there is also many people that still not getting what is this all about and what are the differences between synchronous, asynchronous and threaded programming. In this article I will try to clarify these things…
 
TCP server internals
Let’s look on how the server application receives a request. When the client connects to the server, the server opens a new connection and starts listening for incoming data. Data usually arrives in chunks, and the server tries to find the end of the request by looking for delimiter characters or by a specified length, which might be indicated in the first few bytes.
A typical CPU can handle data transfer rates that are orders of magnitude faster than a network link is capable of sustaining. Thus, the server that is doing lots of I/O will spend much of its time blocked while network catches up. To not to block the other clients while waiting, the server has to handle the request reading in a concurrent manner.
A popular way to do so is to use a thread-per-client. But there are some problems with threads. Well, Python has no real multithreading support at all. Instead it has GIL (Global Interpreter Lock). GIL is necessary mainly because Python’s memory management is not thread-safe. It’s preventing multiple native threads from executing Python bytecodes at once.
On the one hand this makes the threaded programming with Python fairly simple: to add an item to a list or set a dictionary key, no locks are required. But on the other hand it leads to relatively big chunks of code that are executed sequentially, blocking each other for an undetermined amount of time.
The in-depth explanation of this problem is in this video by David Beazley.
Disregarding the Python, there is much wider problem with threads. I was actually surprised with how many cons are in using them. Apparently, the cons are varying from being a bad design (as described here) to more pragmatic ones such as consuming a fair amount of memory, since each thread needs to have its own stack. The stack size may vary on different OS’s.
On .NET it’s usually 1 Mb on 32 bit OS and 4 Mb on 64 bit OS. On Linux OS’s it might be up to 10 Mb per thread. Also, the context switches between many threads will degrade the performance significantly. Commonly, it’s not recommended to have more than 100 threads. Not surprisingly, it is also always difficult to write code that is thread safe. You have to care about such things as race condition, deadlocks, live-locks and starvation!
Fortunately there is a better way to handle concurrency. Python excels in a very specific area; asynchronous (aka non-blocking) network servers.
Back in the days before multi-threading was invented, asynchronous designs were the only available mechanism for managing more than one connection in a single process.
I’d like to illustrate this principle by example published in the linuxjournalsarticle by Ken Kinder:
Have you ever been standing in the express lane of a grocery store, buying a single bottle of water, only to have the customer in front of you challenge the price of an item, causing you and everyone behind you to wait five minutes for the price to be verified?

Plenty of explanations of asynchronous programming exist, but I think the best way to understand its benefits is to wait in line with an idle cashier. If the cashier were asynchronous, he or she would put the person in front of you on hold and conduct your transaction while waiting for the price check. Unfortunately, cashiers are seldom asynchronous. In the world of software, however, event-driven servers make the best use of available resources, because there are no threads holding up valuable memory waiting for traffic on a socket. Following the grocery store metaphor, a threaded server solves the problem of long lines by adding more cashiers, while an asynchronous model lets each cashier help more than one customer at a time.The APM basic flow is visualized below:

reactor-1

The module is waiting for the event. Once there is any, it reacts (thus the name reactor) by calling the appropriate callback:

reactor-doread

Python has introduced a high performance asynchronous server framework already since 1995 called Medusa.

That has turned to an archetype of nowadays well known Zope and Twisted. It has been built initially addressing C10K problem, which is a simple one; how to service 10,000 simultaneous network requests. I refer you to the C10Kwebsite for enormously detailed technical information on this complex problem
It is sufficient to say that asynchronous architectures, with their much smaller memory usage, and lack of need for locking, synchronization and context-switching, are generally considered to be far more performant than the threaded architectures.
Here is a very impressive graph that compares Apache which is threaded with Nginx which is asynchronous.

nginx-apache-memory

So if you’ll ever need to handle hight traffic or just to have fun with trying a different programming thinking, you should consider to write an asynchronous application.
Acknowledgments:
http://google.com
http://aboutsimon.com/tag/david-beazley
http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.pdf
http://www.linuxjournal.com/article/7871
http://www.nightmare.com/medusa/
http://www.zope.org/
http://twistedmatrix.com/
http://www.kegel.com/c10k.html
http://krondo.com/?p=1209
https://ep2012.europython.eu/conference/talks/asynchronous-programming-with-twisted
http://jython.xhaus.com/twisted-and-zope-high-performance-asynchronous-network-servers-in-jython/
http://www.guptamayank.com/why-node.js-single-thread-event-loop-javascript