Alexey Vishnevsky

Lua as a Python’s secret weapon.

There are Lies, Damn Lies, and Benchmarks Pure Python Hello again, today I would like to share some remarkable benchmark findings that demonstrate the way Python may be boosted by embedding Lua into Python code. The process has started with simple task my friend asked me to fulfil in order to compare Python to other languages. So, below is the benchmark we are going to test: fig 1.0

We.. Read More

Extracting textual time-based content from blog page using LCA techique on a DOM tree.

Today there is increasing interest in scraping the latest data from internet. Especially textual data. There is a lot of content providing sites, such as blogs, news, forums, etc. This content is time-based (periodically updated during the time). Extracting time-based content from millions of sites is not a trivial task. The main difficulty here is that we don’t know beforehand what is the format of the HTML page that we.. Read More

Python as an optimal solution for today’s network application programming challenges.

Recently I have made a brief exploration of local job market and have found a simple fact – Python is greatly misunderstood and as a result – extremely underestimated. Having an experience of being employed by different software developing companies, I have heard many times from technical people that Python is slow and doesn’t have a real threading mechanism and due to the lack of static types it’s error-prone; that.. Read More

Tips on optimizing scrapy for a high performance

Running multiply crawlers in a single process. When crawling many web pages it’s important for an application to get an advantage of APM. Scrapy is a Python asynchronous crawling framework, that with small changes is perfectly suites this need. Scrapy has a Crawler component that includes request scheduler as well as visited urls queue, together with all the configuration parameters related to how the crawling process should be performed. Thus.. Read More