Depth Buffer - The gritty details

Do you want to know why your app has z-fighting even when you're using a 24 bits zbuffer in your 3D app ? Or are you interested to know what is the internal representation of a Z-Buffer and aren't afraid of the gritty details ? Then read the following.

The gritty details


First you need to know that the value that is stored in a zbuffer is a discrete one not a continuous one, but most people (at least programmers) know that. There are usually two versions one using 16 bits of precision and one using 24 bits (some hardwares support a 32bits float, but that's not usual).

The value that is stored has an opaque representation (all that the programmer needs to know is that it has a comparison function), but for convenience and illustration purpose we'll say that it is a normalized value between 0 and 1 with an uniform distribution (a fixed precision number that moves by increments of 1/(2^n - 1)).

Now to complicate things a bit more. After having done the transform and lighting, your hardware will have to translate the x,y,z,w homogenous coordinate into a number that fits in the z buffer.

That's where znear, zfar will come into account.

Contrary to what you could guess from its name, the z buffer doesn't hold the z coordinate from your homogenous set of coordinates. But it actually holds z/w.

What's the reason for that ?


One of the reason is that z/w is linearly interpolated on the triangle. So you can compute its value at the three vertices and store directly the linearly interpolated value into your z buffer. It would not be possible with just z or just w. You would have to do the perspective correct thing, of interpolating z/w and 1/w and divide them at each pixel to get z. Why not just store the non perspective correct Z ? because then your intersection and occlusion wouldn't be perspective correct and straight lines at intersections would appear broken. Of course it's not that expensive given that you already do that for texture coordinates (but I guess when accelerated graphics was non existent it was probably a sure win).

The other reason is that it usually helps to have more precision near the observer (where polygons cover more of the screen) and storing a linear Z, would give you a uniform precision.

(There were alternatives like the WBuffer that stored linear Z (or linear W) but they became deprecated on graphics hardware a while ago.)

Now why the precision distribution ?


Ok so you hold z/w, But then what ? As I said before the zbuffer hold numbers between 0 and 1. That gives you two limits, one that is near and one that is far. If the division result of z/w is less than zero then the vertex gets clipped : it's the near plane. If the division result of z/w is more than one then the vertex gets clipped (it's the far plane).

You've got two equations from the above. The first one is z/w = 0 for the near plane. The second is z/w = 1 for the far plane. Now if you express z as a linear variation of w (which is assumed to be always the case in the regular projection), that gives you the following relation :

z = (w - wnear) * (wfar / (wfar - wnear))

You need to resolve the two equations to fall on this relation (we let the resolution of the equations as an exercise to the reader).

Here you can see the variation of z and w in relation to the distance from the eye :

The shape of z/w


So the value that is stored in the z buffer is equal to :

z/w = (wfar - (wnear * wfar)/w) / (wfar - wnear)

(the image is not an accurate hyperbolic curve but it's just for illustration purpose).

As you can see it goes through 0 at w=wnear and it goes through 1 at w=wfar, and as you go toward infinity it tends to a constant wfar/(wfar - wnear).

Of course it doesn't look so wrong on the following image, the portion that is between wfar and wnear is not linear but almost look like it.

But imagine that we make wnear tend toward zero without moving wfar. Then the limit wfar/(wfar - wnear) tends toward 1. And the scaling factor (wnear * wfar) /(wfar - wnear) tends to zero. So the result is a highly vertically compressed hyperbole function that tends toward one.

Once again, not totally accurate but to give you the general shape :

The more you approach wnear to zero, the more you compress your hyperbole function. If you try to compare the difference in precision at close distance vs long distance, almost all depth buffer values are covering close objects and a very small percentage of your depth buffer range is reserved for distant objects.

Quick example


Imagine that you set your wnear value at 1.0 and your wfar value at 10.0. How far is situated your median point ? After a quick resolution you come to the conclusion that half the values of the depth buffer are consumed between 1.0 and 1.8.

Now imagine that you set your wnear value at 0.01 and your wfar value at 10.0. After the same resolution, you come to the conclusion that half the values of your depth buffer are consumed between 0.01 and 0.02. Scariere is the fact that 90% of the depth values are consumed between 0.01 and 0.1.

You can see that if your scene has the majority of its objects that are farther than 0.1, or no objects that are closer than 0.1, then 90% of your depth precision is wasted.

Now what happen if you push the wfar value instead ? So keep the wnear value at 1.0 and the push wfar value at 1000.0. In that case half the depth values will be consumed between 1.0 and 2.0.

Generally speaking this median value is simply

wmedian = 2 * wnear * wfar / (wfar + wnear)

Now to know how much of the total visible w range, does that wmedian value represents, you can compute :

(wmedian - wnear) / (wfar - wnear) = wnear / (wfar + wnear)

You can see as we had already guessed that the fraction is a function of BOTH wnear and wfar and that it reduces as wnear becomes smaller and it also reduces as wfar becomes bigger. When wnear << wfar (small compared to) this percentage is almost equal to wnear/wfar. When wnear ~ wfar (almost equivalent to) this percentage tends to 50% but as a result the depth values are highly compressed.

So how to keep your depth buffer happy ?


Well the true issue is that you can't draw both objects that are very far and objects that are very near with the same depth buffer equations. If you want to draw very far objects then you need to sacrifice your near view by pushing it further. To avoid clipping artifacts you can make your collision envelope large enough so that your clip plane will never intercept an existing object within your frustum. Or you can make object gradually disappear with transparency as they come near your clip plane.

If you want to keep near objects and at the same time draw mountains (or planets) in the far distance, then you can cut your rendering in parts. First drawing your far objects, then clearing the depth buffer and rendering the near objects with a different z buffer.

Partner websites : LEGREG | GRAPHICS | GRAPHISME | PHOTOGRAPHY | OUT OF MY MIND | ANIMATION MENTOR | GREEN LIVING | VOXEL | RAY TRACING | GUENARDI