Performance optimisations

by alski 26. June 2007 20:29

I have a piece of code. It works quite hard, as its trying to summarise some data that's collected every second into 5 minute summaries. This involves reading about half a million rows (at the current time), grouping them into multiple categories (about 7) and times (five minute windows over a 3 month total span), and spitting out summaries (e.g. average last months, Mondays for last month, etc).

Now a subset of this data (just one 1 months worth, for one summary) was taking about 11 and a half minutes to summarise on my poor little laptop, and I didn't want to release this in this state. So how I had a first stab and got it down to about 49 seconds. I thought this was

Therein lies the problem, I didn't follow any particular method. I just re-wrote it. So how should we go about improving the performance.

Some basic rules

Find the problem

If you have access to a performance suite then use it now. Right now. Go away, I'll wait till you have done.

...

Has your analysis shown you the problem is where you expected it? I'll bet a fair sum of money it wasn't. This is common with performance work, operations that you take for granted have become such an integral part of your application that you just use them without considering the performance of them.

If you don't have access to a performance suite then its time to try a little more guesswork. You need to focus on the same issues. Ensure that a method is as fast when its called 50,000 times as it is when it's called the first, particularly if it's something to do with a collection.

Do as little as possible

Sounds obvious. Good. Now check your code.

When you first wrote the code it should not be have been written for performance. It should be easy to read and understand. We are now moving the goal posts, so the code we write is likely to be less understandable, and as a side-effect will require more comments.

Minimise the loop depth

Given a collection of n elements, an algorithm that looks at each one once (n operations) will be faster than one which compares each element with every other element (n * n-1 operations at best, or approximately n2 operations). In the first case you have one loop, in the second you have two. Basically the more nested foreach loops the slower your function.

The holy grail that we are searching for here is the logn operations. This is where we can still obtain our goal without visiting every item in the collection. The best example I know if this is the Binary chop search, however the most common performance improvement is the use of hashed lookups.

Do as much as possible outside the loop(s)

Sounds simple, but have you really done it? One of the oldest and still one of the best examples I know of is to consider a radar display. Consider how it draws that line circling round and round. If it has to calculate the x and y from Sin/Cos every time then its going to be slow, but if it caches the points in a lookup table (Dictionary or even just a List) once and then just uses them, it is going to be much faster.

Is it all worth it ?

Well for me it was... Here's what I just gained.

 
 
Before After

DoSummary() Completed in 00:52:32.9136672
Rows summarised: 501228

DoSummary() Completed in 00:00:50.0619856
Rows summarised: 501228

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags:

Code | Architect

Powered by BlogEngine.NET 1.4.5.0
Theme by Mads Kristensen

RecentComments

Comment RSS