Friday, February 18, 2005

Optimization continues


After a rather hectic week, at last Friday is here. Of course, Fridays are not so nice, because I work from 8 am to 8 pm, but after that I have the whole weekend in front of me for doing some more work! During the week I did not have much time to optimize the volume rendering algorithm much. I have prepared a nice picture (I hope you like it) of the Chapel Hill Dataset. It is shown here at half the actual size (when you click on it) and cropped. The actual image is 700 x 650 pixels and it takes about 9.5 secs to draw. This is way too much, so I will have to get really creative with the optimization tricks. The volume is 208 x 256 x 225 voxels in size (rather small). I have applied a 2D transfer function to show the bones and soft tissues.
I did some more web searching for Delphi optimization and found a really nice site (but with a few links broken). You can find it here. I applied the following changes, and the speed-up is significant:
1. Changed order of nested 'for' loops that loop over a three-dimensional array.
2. Changed FPU precision mode to: SetPrecisionMode(pmSingle);
3. Used multiplication instead of division. I multiply by the inverse, and this is much faster.
4. Turned off range checking and overflow checking (this is rather obvious, but do it after making sure the algorithm works OK): {$R-}{$Q-}
5. Changed sequence of instructions, to group together instructions that use the same variables.
6. Changed variables to smaller size (e.g. single instead of double, byte instead of word), where possible.
7. In-lined small procedures.
Changes that I thought would be beneficial but were not:
1. Changing the 'for' variable to anything else than integer slowed the loop down.
Changes I would like to make but have not figured out how, yet:
1. Substitute 'sqrt' with something faster (even if it is only approximate).
2. Same for 'trunc'.
3. Get rid of cache misses. This seems to be the major problem (I knew that, of course).
It seems that I will not be able to get interactive rates and keep the quality of the image at the same level. So I will probably have to resort to tricks, like drawing at reduced quality when rotating or translating the volume. I could not ask the user to get a faster machine, because I am using a Pentium 4 running at 3 GHz with 1 Gb of RAM. So, this is a fast machine already (for today, at least). Most of the profiling was done by using the QueryPerformanceCounter call. I know that this is not very accurate, but it gives a good estimate if you take the average of a few runs. I have tried VTune but there is a significant learning curve and I did not have much time to go into great depth. Looks promising though.

No comments:

Post a Comment