.
Developer Spot - Web Development Tutorials
 


Web Hosting Directory
Budget Web Hosting Linux Web Hosting Small Business Hosting
Windows Web Hosting Reseller Web Hosting Web Hosting Articles

Improve Linux performance

By Cameron Laird
2004-04-08
Reader Rating: 5 out of 5
Bookmark Print Version
Sorting is hard

That was an "easy" lesson; "civilians" with no particular programming background can understand the previous paragraph. Sorting, though -- that's in the "hard" category. And it must be: Donald Knuth devotes an entire volume of his classic series on computing to sorting and searching (see Resources for a link to a review of Knuth's The Art of Computer Programming).

Sorting turns out to be at the heart of many performance challenges for a deep reason. Performance is an issue for large-scale computations alone. Humans can only understand a few items at a time, so the way we grasp problems large enough to take a long time is to organize or structure them in some way. Sorting is the most common of these ways; a sorted list can be binary-searched rather than linearly searched. That simplifies manual lookup in the New York City telephone directory, for example, from a problem on the scale of a million to a problem on the order of the logarithm of a million (about fourteen, say).

That's typical of sorting work; superior algorithms often outperform inferior ones by a factor of a thousand or more. It pays to sort intelligently.

Most intelligent of all, however, is to avoid sorting altogether, or at least limit it to occasions when it won't be noticed. That's common with database management systems; among other benefits, they allow for the construction of "insert-time" indexes. When a sorted result is needed, it can be read directly from the index. The cost is a slightly longer time to create or insert elements -- but users won't notice this if the application's workflow is typical.

Knuth's reference describes plenty of other tactics, such as "Boyer-Moore" or "Rabin-Karp" for accelerating sorting operations in specific situations. One common principle that unifies several of these is that it can be easier to solve a general problem than a more specific one. Mathematicians are familiar with this; common problems in algebra are more tractable when considered over the complex numbers than over the reals, even though complex arithmetic is more difficult.

That's how I think about "decorate-sort-undecorate" (DSU). Suppose you have this dataset:

"Jane Jones" -> 15,000
"Dan Smith" -> 6,000
"Kim Black" -> 40,000

Say you need the names sorted by their associated revenues. A naive approach might be to use a library entry point for sorting, supplied with a comparison function to say that "Kim Black" precedes "Dan Smith" because 40,000 is greater than 6,000, and so on. That's straightforward and effective for small problems.

Far more efficient, though -- especially when sorting a million items at a time, as is common in bioscience, finance, and other domains -- is to go to the trouble of creating a special-purpose "reverse table." While this operation takes time, it allows the sorting to be very fast. Python's list comprehensions make this particularly economical to express; the same principle applies in any language you might have at hand, though:

Listing 3. Minimal decorate-sort-undecorate illustration, in Python

decorated = [ (revenues[name], name) for name in name_list ]
decorated.sort()
for (revenue, name) in decorated:
print "%14s: %8d" % (name, revenue)


Sorting the derived dataset can easily accelerate the overall operation by an order of magnitude or more. Calculations of computational complexity seem to be one of the harder subjects for students; there's no doubt of their value, though, when they lead to such speedups.



Article Pages:
Improving Linux Performance
Two countdown examples
Sorting is hard
500 times as quick
Disk drives deserve doubt
Resources

First published by IBM developerWorks


 Rate this article:   Poor          Excellent 


If you found this article interesting, you may want to read these as well:



 
Development Tutorials
ASP
CGI & Perl
CSS
HTML
Java
JavaScript
Linux
PHP
XML




More Resources
Web Hosting Articles
Development Tutorials: CGI & Perl - CSS - HTML - Java - JavaScript - Linux - PHP - XML