Wednesday, February 4, 2009

Volume Rendering 201: Optimizations

The discussion on hand this time is optimizing the performance of the volume renderer. I’ll cover a few of these optimizations and provide a rough implementation of one of them.

Cache Efficiency and Memory Access 


Picture from Real-time Volume Graphics.

Currently we load our volume data into a linear layout in memory. However, the ray that is cast through the volume is not likely to access neighboring voxels as we traverse it through the volume when the data is in a linear layout. But we can improve the cache efficiency by converting the layout to a block-based manor through swizzling. With the data in a block format, the GPU is more likely to cache neighboring voxels as we walk through the volume, which will lead to an increase in memory performance.

Empty-Space Leaping 


In the previous samples we ray-casted against the entire bounding volume of the data set, even if we were just sampling samples with zero alpha along the way. But we can skip these samples all together, and only render parts of the volume that have a non-zero alpha in the transfer function. More on this in a bit.

Occlusion Culling

If we render the volume in a block-based fashion as above, and sort the blocks from front to back, we can use occlusion queries to determine which blocks are completely occluded by blocks in front of them. There are quite a few tutorials on occlusion queries on the net, including this one at ziggyware.

Deferred Shading

We can also boost performance by deferring the shading calculations. Instead shading every voxel during the ray-casting, we can just output the depth and color information into off-screen buffers. Then we render a full-screen quad and use the depth information to calculate normals in screen space and then continue to calculate the shading. Calculating normals this way also has the advantage of being smoother and have less artifacts that computing the gradients of the volume and storing them in a 3D texture. We also save memory this way since we don’t have to save the normals, only the isovalues in the 3D texture.

Image Downscaling

If the data we are rendering is low frequency (e.g. volumetric fog), we can render the volume into an off-screen buffer that is half the size of the window. Then we can up-scale this image during a final pass. This method is also included in the sample.

Implementing Empty-Space Leaping

To implement empty-space leaping we need to subdivide the volume into smaller volumes and we also need to determine if these smaller volumes have an opacity greater than zero. To subdivide the volume we follow an approach very similar to quadtree or octree construction. We start out with an original volume from [0, 0, 0] to [1, 1, 1]. The volume is then recursively subdivided until the volume width is say .1 (so we basically divide the volume along each dimension by 10). Here’s how we do that:

private void RecursiveVolumeBuild(Cube C) { //stop when the current cube is 1/10 of the original volume if (C.Width <= 0.1f) {   //add the min/max vertex to the list 

Vector3 min = new Vector3(C.X, C.Y, C.Z); 

Vector3 max = new Vector3(C.X + C.Width, C.Y + C.Height, C.Z + C.Depth); 

Vector3 scale = new Vector3(mWidth, mHeight, mDepth); 

//additively sample the transfer function and check if there are any 

//samples that are greater than zero 

float opacity = SampleVolume3DWithTransfer(min * scale, max * scale); 

if(opacity > 0.0f) 


BoundingBox box = new BoundingBox(min, max);   

//add the corners of the bounding box     

Vector3[] corners = box.GetCorners();   

for (int i = 0; i < 8; i++)   


VertexPositionColor v;     

v.Position = corners[i];     

v.Color = Color.Blue;     




return; } float newWidth = C.Width / 2f; float newHeight = C.Height / 2f; float newDepth = C.Depth / 2f; /// SubGrid r c d /// Front: /// Top-Left : 0 0 0 /// Top-Right : 0 1 0 /// Bottom-Left : 1 0 0 /// Bottom-Right: 1 1 0 /// Back: /// Top-Left : 0 0 1 /// Top-Right : 0 1 1 /// Bottom-Left : 1 0 1 /// Bottom-Right: 1 1 1 for (float r = 0; r < 2; r++) {   for (float c = 0; c < 2; c++)

{   for (float d = 0; d < 2; d++) 


Cube cube = new Cube(C.Left + c * (newWidth),   

C.Top + r * (newHeight),     

C.Front + d * (newDepth),   






} } }

To determine whether a sub-volume contains any samples that have opacity, we simply loop over the volume and additively sample the transfer function:

private float SampleVolume3DWithTransfer(Vector3 min, Vector3 max) { float result = 0.0f; for (int x = (int)min.X; x <= (int)max.X; x++) {   for (int y = (int)min.Y; y <= (int)max.Y; y++) 


for (int z = (int)min.Z; z <= (int)max.Z; z++)   


//sample the volume to get the iso value     

//it was stored [0, 1] so we need to scale to [0, 255]     

int isovalue = (int)(sampleVolume(x, y, z) * 255.0f);     

//accumulate the opacity from the transfer function     

result += mTransferFunc[isovalue].W * 255.0f;   



} return result; }

Depending on the transfer function (a lot of zero opacity samples), this method can increase our performance by 50%.


Now, a problem that this method introduces is overdraw. You can see the effects of this when rotating the camera to view the back of the bear; here the frame rate drops considerably. To remedy this the sub-volumes need to be sorted front to back by their distance to the camera each time the view changes. I’ve left this as an exercise for the reader. The new demo implements empty space leaping and downscaling. And when rendering the teddy bear volume frame rates on my 8800GT are at about 190 FPS. Compare this to the the last demo from 102 at 30 FPS. All at a resolution of 800x600. Pretty good results! Next time I’ll be introducing soft shadows and translucent materials.

References: Real-time Volume Graphics


Darren said...

Hey Kyle,

I finally got around to downloading the XNA framework so that I could build your projects but then I ran into a snag. While I am a C# developer at work, all of my graphics work at home is in C++/OpenGL and Cg. So, I have a feeling my problem is a lack of knowledge with this environment, in particular, the shaders.

So, when I try to build the project, I get:

Error 2 C:\Documents and Settings\KMan\Desktop\VolumeRayCasting_201\VolumeRayCasting\Content\Shaders\RayCasting.fx(207,12): warning X4121: gradient-based operations must be moved out of flow control to prevent divergence. Performance may improve by using a non-gradient operation
C:\Documents and Settings\KMan\Desktop\VolumeRayCasting_201\VolumeRayCasting\Content\Shaders\RayCasting.fx(169,15): warning X3571: pow(f, e) will not work for negative f, use abs(f) or conditionally handle negative values if you expect them
C:\Documents and Settings\KMan\Desktop\VolumeRayCasting_201\VolumeRayCasting\Content\Shaders\RayCasting.fx(139,12): warning X4121: gradient-based operations must be moved out of flow control to prevent divergence. Performance may improve by using a non-gradient operation

Actually, tons of the same warning. Now, I realize that these warnings are coming from the RayCasting_fxc.err file, especially since it contains your path.

When I empty out the file and re-compile, I get:
Warning 1 Exit code: 1 C:\Users\dbrust\Desktop\VolumeRayCasting_201\VolumeRayCasting\Content\Shaders\RayCasting.fx VolumeRayCasting

Now, I know this must all work correctly because you are using it, but I am not quite sure how to tackle this problem. Am I missing something in my environment?

Thanks in advance,

Kyle Hayward said...

I had this problem with older versions of the DirectX SDK. It seems the fxc compiler is returning an error code even when there are just warnings.

To fix this you can update to the latest DirectX SDK (November 2008, this is what I'm using).

Or you can go into the WindowsEffectCompiler.cs file and replace this line:

if (p.ExitCode != 0 || text.Contains("error"))

with this:

if (text.Contains("error"))

Darren said...

That took care of it...I had the April 2007 version.

So, I ran the application in debug, and had some interesting results. With the bear facing forward I am getting framerates of around 200. Rotating the bear, however, to look at the back, the framerate drops to about 27. Is this what you would expect?

Kyle Hayward said...

Pretty much. I note at the end of the tutorial that this is because of overdraw problems. When looking at the back of the bear, the front volume are getting drawn first, then continue to the back volumes. As the volumes go towards the back of the bear they overwrite the results from the previous volume.

The volumes need to be sorted front to back based on distance to the camera. And this should resolve the problem.

Darren said...

My bad. I read about your overdraw concerns, but did not associate that with slower framerates. I just didn't think it through...for some reason, my mind jumped straight to MIP type things.

I'll have to think about the sorting of the blocks ;)


Mobeen said...

Hi Kyle,
First up thanks for this wonderful tutorial series. I am enjoying every bit of it. I have a problem though.
I m using opengl and Cg for shaders. I have converted the solution the only thing that is wrong for my case is that I get borders of the boxes in my rendering see this image
Do u know what may be causing this? Note that my surface format is RGBA16 exactly as your and the shader is the same as yours.


Kyle Hayward said...

Hi Mobeen,

I think this may be a problem with using the rasterization approach to update the position buffers. I ran into the same problem when I used the ESL sub volumes to update the buffers.

There shouldn't be a problem if you still use the original bounding cube to update the position buffers. And then, in the shader, instead of using the front texture to get the starting position, you can simply use the 3d texture coordinates.

This "fix" isn't technically correct as you could end up sampling empty areas since the back texture will be the back of the entire volume and not the subvolumes.

Hope that helps.
Edit: Cool videos on your blog by the way :)

Next installment:
I already have the shadow and translucent samples done, but in the next installment I might switch from using the rasterization approach to actually intersecting the subvolumes with a ray to find the sampling start and end positions in the vertex shader.

Mobeen said...

Hi Kyle,
Bingo.... that did the trick. Here is the final rendering.

Thanks for appreciating the vdos, its just part of the work that i am currently doing. I have already converted the optimization demo to opengl and cg. The shader is exactly the same. The only thing changed is the handling of FBO and some opengl bits and pieces.

For the next series, would u be using half angle slicing or using deep shadow maps? anyways i cant wait to see it though :)

Thanks once again for the wonderful details.


Kyle Hayward said...

From my understanding, half-angle slicing shadowing isn't possible with volume ray-casting since we can't switch between shadow or color buffers b/t each slice. And deep shadow maps are still to slow for real-time use.

So I take the easy way out and just use simple variance shadow mapping which achieves pretty good results. I also use a similar method to approximate translucency.

Spacerat said...

I would like to start the App, but I only get an Unhandled Exception..

ProcessID 0x474, ThreadID 0xf20

I have no idea if I missed to install something..

Kyle Hayward said...

You need to have a Shader Model 3.0 graphics card in order to run the sample. Also make sure that the dataset files are in the right folder (bin/x86/debug/models).

When the application breaks, what line in what file is it breaking at? And are there any null objects that are being referenced?

Anonymous said...

I'm using an ATI Radeon X1300PRO. Getting "Device does not support texture width and height values that are not powers of 2." when creating new Texture3D. Obviously the issue is the 128x128x62, 62 is not a power of 2. I can probably hack something to make it work, but might be nice to include a check for this issue.

Kyle Hayward said...

Thanks for that. I linked to a couple of pages with datasets on them (on the first post I believe). You should be able to find a power of 2 dataset from one of the pages.

I'll make a note of it in the post.

Anonymous said...


I have a message for the webmaster/admin here at

Can I use part of the information from this blog post above if I provide a link back to your site?


Kyle Hayward said...

Sure, that would be fine.