Sep 112013
 

In this previous post I talked about Bokeh depth of field, where it comes from and why it is different to the type of fake depth of field effects you get in some (usually older) games. In this slightly more technical post I’ll be outlining a nice technique for rendering efficient depth of field, which I use in my demo code, taken from this EA talk about the depth of field in Need For Speed: The Run.

The main difference is the shape of the blur – traditionally, a Gaussian blur is performed (a Gaussian blur is a bell-shaped blur curve), whereas real Bokeh requires a blur into the shape of the camera aperture:

Bokeh blur on the left, Gaussian on the right

The first question you might be asking is why are Gaussian blurs used instead of more realistic shapes? It comes down to rendering efficiency, and things called separable filters. But first you need to know what a normal filter is.

Filters

You’re probably familiar with image filters from Photoshop and similar – when you perform a blur, sharpen, edge detect or any of a number of others, you’re running a filter on the image. A filter consists of a grid of numbers. Here is a simple blur filter:

\left(\begin{array}{ccc}\frac{1}{16}&\frac{2}{16}&\frac{1}{16}\\\frac{2}{16}&\frac{4}{16}&\frac{2}{16}\\\frac{1}{16}&\frac{2}{16}&\frac{1}{16}\end{array}\right)

For every pixel in the image, this grid is overlaid so that the centre number is over the current pixel and the other numbers are over the neighbouring pixels. To get the filtered result for the current pixel, the colour under each of the grid element is multiplied by the number over it and then they’re all added up. So for this particular filter you can see that the result for each pixel will be 4 times the original colour, plus twice each neighbouring pixel, plus one of each diagonally neighbouring pixel, and divide by 16 so it all adds up to one again. Or more simply, blend some of the surrounding eight pixels into the centre one.

As another example, here is a very basic edge detection filter:

\left(\begin{array}{ccc}-1&-1&-1\\-1&8&-1\\-1&-1&-1\end{array}\right)

On flat areas of the image the +8 of the centre pixel will cancel with the eight surrounding -1 values and give a black pixel. However, along the brighter side of an edge, the values won’t cancel and you’ll get bright output pixels in your filtered image.

You can find a bunch more examples, and pictures of what they do, over here.

Separable filters

These example filters are only 3×3 pixels in size, but they need to sample from the original image nine times for each pixel. A 3×3 filter can only be affected by the eight neighbouring pixels, so will only give a very small blur radius. To get a nice big blur you need a much larger filter, maybe 15×15 for a nice Gaussian. This would require 225 texture fetches for each pixel in the image, which is very slow!

Luckily some filters have the property that they are separable. That means that you can get the same end result by applying a one-dimensional filter twice, first horizontally and then vertically. So first a 15×1 filter is used to blur horizontally, and then the filter is rotated 90 degrees and the result is blurred vertically as well. This only requires 15 texture lookups per pass (as the filter only has 15 elements), giving a total of 30 texture lookups. This will give exactly the same result as performing the full 15×15 filter in one pass, except that one required 225 texture lookups.

Original image / horizontal pass / both passes

Unfortunately only a few special filters are separable – there is no way to produce the hard-edged circular filter at the top of the page with a separable filter, for example. A size n blur would require the full n-squared texture lookups, which is far too slow for large n (and you need a large blur to create a noticeable effect).

Bokeh filters

So what we need to do is find a way to use separable filters to create a plausible Bokeh shape (e.g. circle, pentagon, hexagon etc). Another type of separable filter is the box filter. Here is a 5×1 box filter:

\left(\begin{array}{ccccc}\frac{1}{5}&\frac{1}{5}&\frac{1}{5}&\frac{1}{5}&\frac{1}{5}\end{array}\right)

Apply this in both directions and you’ll see that this just turns a pixel into a 5×5 square (and we’ll actually use a lot bigger than 5×5 in the real thing). Unfortunately you don’t get square Bokeh (well you might, but it doesn’t look nice), so we’ll have to go further.

One thing to note is that you can skew your square filter and keep it separable:

Then you could perhaps do this three times in different directions and add the results together:

And here we have a hexagonal blur, which is a much nicer Bokeh shape! Unfortunately doing all these individual blurs and adding them up is still pretty slow, but we can do some tricks to combine them together. Here is how it works.

First pass

Start with the unblurred image.

Original image

Perform a blur directly upwards, and another down and left (at 120°). You use two output textures – into one write just the upwards blur:

Output 1 – blurred upwards

Into the other write both blurs added together:

Output 2 – blurred upwards plus blurred down and left

Second pass

The second pass uses the two output images from above and combines them into the final hexagonal blur. Blur the first texture (the vertical blur) down and left at 120° to make a rhombus. This is the upper left third of the hexagon:

Intermediate 1 – first texture blurred down and left

At the same time, blur the second texture (vertical plus diagonal blur) down and right at 120° to make the other two thirds of the hexagon:

Intermediate 2 – second texture blurred down and right

Finally, add both of these blurs together and divide by three (each individual blur preserves the total brightness of the image, but the final stage adds together three lots of these – one in the first input texture and two in the second  input texture). This gives you your final hexagonal blur:

Final combined output

Controlling the blur size

So far in this example, every pixel has been blurred into the same sized large hexagon. However, depth of field effects require different sized blurs for each pixel. Ideally, each pixel would scatter colour onto surrounding pixels depending on how blurred it is (and this is how the draw-a-sprite-for-each-pixel techniques work). Unfortunately we can’t do that in this case – the shader is applied by drawing one large polygon over the whole screen so each pixel is only written to once, and can therefore only gather colour data from surrounding pixels in the input textures. Thus for each pixel the shader outputs, it has to know which surrounding pixels are going to blur into it. This requires a bit of extra work.

The alpha channel of the original image is unused so far. In a previous pass we can use the depth of that pixel to calculate the blur size, and write it into the alpha channel. The size of the blur (i.e. the size of the circle of confusion) for each pixel is determined by the physical properties of the camera: the focal distance, the aperture size and the distance from the camera to the object. You can work out the CoC size by using a bit of geometry which I won’t go into. The calculation looks like this if you’re interested (taken from the talk again):

CoCSize = z * CoCScale + CoCBias
CoCScale = (A * focalLength * focalPlane * (zFar - zNear)) / ((focalPlane - focalLength) * zNear * zFar)
CoCBias = (A * focalLength * (zNear - focalPlane)) / (focalPlane - focalLength) * zNear)

[A is aperture size, focal length is a property of the lens, focal plane is the distance from the camera that is in focus. zFar and zNear are from the projection matrix, and all that stuff is required to convert post-projection Z values back into real-world units. CoCScale and CoCBias are constant across the whole frame, so the only calculation done per-pixel is a multiply and add, which is quick. Edit – thanks to Vincent for pointing out the previous error in CoCBias!]

In the images above, every pixel is blurred by the largest amount. Now we can have different blur sizes per-pixel. Because for any pixel there could be another pixel blurring over it, a full sized blur must always be performed. When sampling each pixel from the input texture, the CoCSize of that pixel is compared with how far it is from the pixel being shaded, and if it’s bigger then it’s added in. This means that in scenes with little blurring there are a lot of wasted texture lookups, but this is the only way to simulate pixel ‘scatter’ in a ‘gather’ shader.

Per-pixel blur size – near blur, in focus and far blur

Another little issue is that blur sizes can only grow by a whole pixel at a time, which introduces some ugly popping at the CoCSize changes (e.g. when the camera moves). To reduce this you can soften the edge – for example if sampling a pixel 5 pixels away, blend in the contribution as the CoCSize goes from 5 to 4 pixels.

Near and far depth of field

There are a couple of subtleties with near and far depth of field. Objects behind the focal plane don’t blur over things that are in focus, but objects in front do (do an image search for “depth of field” to see examples of this). Therefore when sampling to see if other pixels are going to blur over the one you’re currently shading, make sure it’s either in front of the focal plane (CoCSize is negative) or the currently shaded pixel and the sampled pixel are both behind the focal plane and the sampled pixel isn’t too far behind (in my implementation ‘too far’ is more than twice the CoCSize).

[Edit: tweaked the implementation of when to use the sampled pixel]

This isn’t perfect because objects at different depths don’t properly occlude each others’ blurs, but it still looks pretty good and catches the main cases.

And finally, here’s some shader code.

  34 Responses to “A depth of field implementation”

  1. Excellent Work!
    I’ve got one question: why devide by three in the final pass? I think there should be a devide by 2 step before the ouput2, and then a devide by 2 step in the final pass.

    • I also thought that when first implementing it, but it turns out not to work. Here’s why:

      We have intermediate1 = A and intermediate2 = B+C. Add both together and divide by three and you get (A+(B+C))/3 = A/3 + B/3 + C/3. This sums to 1 (modulo whatever is in the textures) and is an equal amount of each rhombus, so they are all equally bright which is what we want.

      Doing a divide by two twice doesn’t give this. Again we have int1 = A, int2 = B+C.
      First divide by two: int2 = (B+C)/2 = B/2 + C/2.
      Add both together and divide by two: output = (A+(B/2 + C/2)) / 2 = A/2 + B/4 + C/4.
      The overall image brightness still sums to 1, but one rhombus is twice the brightness of the other two.

      Slightly counterintuitive but the divide by three method works! You can see in the intermediate images actually that pixels are the same brightness in both, there are just more of them in the second one. So dividing the second one by two would make it darker than the first, which would never be corrected.

      • Thanks, this helps a lot.

        However this may cause some problem while B+C is more than 1.0, especially with the alpha channel. So I use this: color1 = A, color2 = (B+C) / 2, output = (color1+color2 * 2) / 3.

        I know this is not good if B+C is a small value but I think the bright part is more important than the dark part in dof.

  2. this is a very cute effect, i have a (noob) question. what is the matrix of the kernel of the skewed box? how can i have a separable skewed filter box? thanks for your help

    • I use the same 1D filter for both the straight and the skewed blurs, e.g {1/5 1/5 1/5 1/5 1/5}. It’s the texture coordinates that are skewed rather than using a complicated kernel. The skewed sampling is something like this, stepping along pixels at 30 degrees from vertical:

      for (i = 0; i < 5; i++) { float2 offset = float2(0.866, -0.5); float dist = (i + 0.5); totalCol += tex0.Sample(samplerLinear, input.uv + (offset*dist)) * 0.2; } The full kernel after applying both passes is very roughly: 0 0 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 0 0 0 0 (divide by 25) except not as tall, at 30 degrees instead of 45 degrees, and with odd fractional values around the edges where the sample points don't land exactly on a pixel (but that's a lot more awkward to draw 🙂 ) A skewed box filter is equivalent to shearing the original image, performing a standard box filter, and then shearing the result back again. The shearing doesn't affect the separability of the filter. Hope that helps!

  3. thank you very much for your answer the final question is the skewed box filter should be separable, the matrix form you write doesn’t seem so there is a mathematical form to write a separable skewed box filter?

    • It’s not separable if you try to separate it along the perpendicular X and Y axes, like you would for a Gaussian or something. You have to separate it along the axes of the edges of the skewed box. So with the matrix above you separate along the horizontal axis, and along the 45 degrees up and right axis.

      It’s not a general mathematical method because the directions the 1D filters are applied in depend on the particular skewed box kernel you want. With normal separable filters it’s just the contents of the 1D kernels that change, while the directions are fixed (horizontally and vertically).

  4. Hi Andy,
    i tried to implement the hexagonal blur i used your image as source and i replicated all the passes. The result it’s quite similar to yours but my quality is poor i would like to post an image to explain what i’m talkin about. For example the final image is darker and i can see only few hexagons.

    This is my shader for verical blur: (the down lefti it’s quite similar)

    uniform sampler2D _MainTex;
    const int iteration = 75;
    uniform lowp vec4 _MainTex_TexelSize;

    void main()
    {
    vec4 colorPixel = vec4(0.0);
    vec2 pixelOffset = vec2(0.0,-1.0);
    float divisor = 0.0;
    vec2 pixelCoord = vec2(0.0);
    for (int index = 0; index < iteration; ++index)
    {
    colorPixel += texture2D(_MainTex, uv_center + _MainTex_TexelSize.xy * pixelCoord );
    divisor+=1.0;
    pixelCoord += pixelOffset;
    }

    colorPixel/= divisor;

    gl_FragColor = colorPixel;
    }

    There is something wrong here?

    • the image became darker each pass

      • If you’re using my original source image from the article then that’s likely to be your problem there. The source image in the running code is a full HDR image, whereas the images here have been tone mapped for display. The bright white pixels that are producing the full hexagons have got values of 10-100 times brighter than ‘white’.

        If you apply an 8 pixel hexagon filter to a pixel, the hexagon will be 1/64 as bright. Starting with a really bright HDR pixel, the hexagon will still be bright after blurring. Using an LDR source image will just make it really dark. That’s why DOF is great for HDR images, to really bring out the bright pixels.

        Also it looks like you’re using a 75 pixel filter? That’s pretty huge. If you’re not doing the pseudo-scatter (only sampling based on the CoC size of the sampled pixels) then you’ll be making your hexagons 1/(75×75) times as bright as the central pixel.

        • Thank you very much. 75 iterations was just to try how dark the image becames after a huge vertical filter, i’m running on mobile usually my filter is something like 5×5 on an half sized image.

  5. Thanks for the great article!

    I wonder how you do scatter-as-gatther using a separable approach? That should spread pixels that shouldn’t be spread on the second pass.

    • Thanks! Yes you’re right, there are issues with a separable gather.

      E.g. in the first pass you’re shading an in-focus pixel, and a blurry pixel scatters onto it. I find the main problem is actually the opposite of what you say – the blurry pixel not being spread on the second pass. The output CoC value therefore needs to be a max() to ensure it’s picked up in the second pass, but this will mean that the original (non-blurry) pixel colour will be scattered as well, as you say.

      The actual shader does this to calculate the output CoC:

      if (blurNear) // Pixel is in front of the focal plane
      {
      if (outCol.a < 0.0f) { // Pixel is already near blurred, so see if it's any blurrier. outCol.a = min(outCol.a, col.a); } else { // Pixel is far blurred. Only near-blur it if that is stronger. if (-col.a > outCol.a)
      {
      outCol.a = col.a;
      }
      }
      }

      The artefacts with scattering pixels that shouldn’t are pretty subtle, while not scattering pixel that should leaves big holes in the hexagon. The incorrect colour spreading will only be visible inside a blurred hexagon, which is already a mess of colours, so you’re unlikely to notice it!

      So in summary, I don’t fix the problem, it just turns out to not really be an issue. Sorry!

  6. In the calculation of CoCBias, shouldn’t it be (focalPlane – focalLength) instead of (focalPlane * focalLength)?
    I just used the formula from Wikipedia and plugged in the depthbuffer to z-coordinate conversion. My result was the formula given here/from the talk, but with this minor change.
    This would also explain the parenthesis and why focalLength does not cancel out.

    • Argh, yes, you’re absolutely right. I just checked my source code and it has the fix in, which will be why it works in my renderer, but I evidently forgot and copied it from the paper instead. Sorry about that, thanks for the correction!

  7. Hello again,
    I’m now trying to implement the effect as explained by you, but I’m having difficulties getting the near/far field to work properly. I’m not exactly sure what value to use as the CoC in the second pass (although another comment addressed this already) and what exactly do you mean by “within *2 of each other”.
    Could you maybe share your exact formulas or even publish the shader source?
    Also thanks a lot for the article, it was very helpful for understanding the slides.

    • First you work out the CoC for each raw input pixel, as you know. When you do the first blur pass, the new CoC for each pixel is given in outCol.w in the code, and is the maximum CoC size of those samples (actually does a min() because near blurs are negative in this case). When sampling in the second pass, you use this CoC value output from the first pass.

      The “within *2 of each other”:
      Pixels only blur over other pixels that are in front of them. Object in front of the focal plane will always blur, and the blur amount increases as they get closer, so this never cause a problem.

      For objects behind the focal plane, they blur more as the get further away, but they should be occluded by less blurred pixels closer to the camera. To simulate this while keeping it looking nice, I only gather blurred samples from other pixels behind the focal plane where (thisPixel.CoC < otherPixel.CoC*2) && (otherPixel.CoC < thisPixel.CoC*2). It stops pixels miles behind blurring over pixels just behind the focal plane, while allowing those roughly at equal distances behind the focal plane to blur smoothly over each other. Without the *2 leeway, you get a lot of artefacting between pixels at similar depths. It's just a bodge really, there are probably better ways to do it but it worked well enough. Hope that makes sense. Here's the expanded sample/gather code from above, with the *2 stuff (and fractional samples, so individual pixels can blur in smoothly): float absBlur = abs(col.w) if (blurNear || (col.w < absBlur * 2.0f && absBlur < col.w * 2.0f)) { // Ignore samples that won't scatter here. if (absBlur > offsetDist && (blurNear || col.w > 0.0f))
      {
      // Smoothly blend in between samples. One pixel is OneOverScreenHeight in size.
      float frac = (absBlur – offsetDist) / OneOverScreenHeight;
      frac = saturate(frac);
      outCol.xyz += col.xyz * frac;

      if (blurNear) // Pixel is in front of the focal plane
      {
      if (outCol.a < 0.0f) { // Pixel is already near blurred, so see if it's any blurrier. outCol.a = min(outCol.a, col.a); } else { // Pixel is far blurred. Only near-blur it if that is stronger. if (-col.a > outCol.a)
      {
      outCol.a = col.a;
      }
      }
      }
      }
      return frac;
      }
      return 0.0f;

      • I should really work out how to post formatted code in comments…

      • Thanks a lot!

        Sorry for bothering you again, but I’ve still got some questions regarding your code.

        Where does blurNear come from? I guess blurNear = col.w 0. (this would be always true except for col.w == 0).

        In the second line, shouldn’t col.w actually be outCol.w? Your code compares col.w to it’s absolute value, which does not seem right.

        Also, the “return frac” seems to be misplaced, I’m not sure if you omitted something there or just confused the braces.

        In your first paragraph you talked about the maximum of the CoC and how it actually is a minimum in case of near blur, but your code only updates outCol.a in case of a nearBlur. Is this intended or did you just omit this part for brevity?

        Thanks again for your great help!

        • Yeah, I’m just going through my shader code again and some stuff seems a bit weird with regard to what I posted above. Give me a little while to fix it up and I’ll post something better and answer the questions!

        • OK, I’ve been playing around a bit with my shader code and I’m pretty confident it’s all working properly now. There’s a download link at the end of the article. It’s fairly heavily commented so hopefully it’s clearer now.

          col.w is the CoC size for the pixel, which is positive for things further away than the focal plane, and negative for things closer. So blurNear is true if col.w is negative.

          I tweaked the condition for when to use the offset texture sample, so that nearer pixels always blur over further away pixels but far pixels don’t blur over other pixels too far in front of them. I think it was a bit broken before…

          outCol.a is updated with near blurs so that near pixels that blur over far pixels will continue to be blurred again in the second pass. It’s unnecessary with far pixels because they only blur onto other pixels that are already blurred, if that makes sense.

          • Awesome! Thank you, this works. I think my solution was close to yours, but I used the output directly without blending, which caused many black parts in the focus area. Your way of blending based on the final CoC is quite nice 🙂

  8. What is OneOverFilmSize in your dot.fx? 1.0 / length(float2(Width, Height)) ?

    • Film size is the physical size of the film in the camera you’re simulating. The focal length, focal plane and aperture size tell you how big the CoC will be in metres (or whatever unit you’re using). The film size is used to convert that to the proportion of the screen that it covers – a bigger film means the CoC will look smaller.

      My code is set up for 24mm film, so OneOverFilmSize is a constant of 1/0.024

  9. Hey, I was wondering what kind of units the CoC variables are giving in ?
    E.g. Aperture, FocalLength ?

    • To be honest that’s the part I was never completely sure of. I can tell you what I use and you may be able to add some clarity…

      Focal length is in metres. In my code it’s calculated as:
      focalLength = (0.5f * filmSize) / tanf(0.5f * fovY) = 21mm (where film size is 0.024 (24mm film) and field of view is 60 degrees).

      Focal plane is in metres. In the final image above it’s probably around 5m.

      Aperture I’m just not sure about. The code uses a value of 0.1. Larger values mean a larger CoC, so I assume it’s proportional to the inverse of the camera f-stop. I imagine this can be worked out geometrically, but that’s left as an exercise for the reader 🙂

      Sorry I can’t be more help with this bit!

  10. I think I’ve kind of figured it out.
    My focal length is e.g. 210mm / 1000.0f (so that it’s measured in meters)
    Aperture has to be the diameter not the fstop number which is calculated as (focalLength / fStopNumber).
    FocalPlane I just use a linear value from 0 to 1 and multiply that by the farClip
    My filmSize is 0.04326 (43.26mm)

    Hope that helps anyone who’s also unsure

    • That sounds plausible, thanks.

      • After testing around some more, the area of focus seems too narrow for it to be realistic using aperture = focalLength / fStopNumber.
        Something like Aperture = (focalLength / fStopNumber) * 0.1 produces a more realistic behavior, however I really dislike using some magic empirical number to change an otherwise physically correct calculation.

        • Sry for the double post but there’s no edit function.

          If you remove the line where you multiply the CoC by the reciprocal of the filmSize then it works quite nicely.

          • Great, glad you got it figured out. I’ve updated the shader code to remove that line. As you say, that’s helped clarify the last bit that didn’t really sit right.

  11. I’ve got a few more questions maybe you could answer for me to get a better understanding of your code/algorithm:

    1. Why is the diagonal blur xStep downwards if the xStep value is positive ? A direction vector of(0.866, -0.5) in my mind would point towards the bottom right, not left. It seems like the x-axis is negated. How come ?

    2. Maybe a noob question but why do you add the 0.5f to the sampleIndex in “stepDist”

    3. Related to question 2: Why are you only multiplying your pixel distances by OneOverScreenHeight ?
    For the xStep you use (OneOverScreenWidth/OneOverScreenHeight), why not for the other occasions ?

    • 1. The texture coord origin is actually at the top left, so (0.866, -0.5) is up and right. This means the vector is exactly opposite what you’d expect. This is because we’re implementing a ‘scatter’ as a ‘gather’ – conceptually we’re scattering light down and left in the first pass, but it’s implemented as sampling from the potential source pixel (up and right), for each pixel that the scattered light might reach. Hence the vector is opposite.

      2. I can’t remember precisely because I wrote this ages ago, but I think it’s so you have a uniform sampling in each direction. Imagine you’re doing one blur to the right, and then a second to the left. You’d first sample at coords 0.5, 1.5, 2,5…, and then at coords -0.5, -1.5, -2.5… etc. The distance between each sample is 1, even across the boundary. It’s the same principle but in a hexagon.

      3. Texture coordinates for sampling are [0, 1] in both directions. Because we want to step one pixel at a time, we convert [0, ScreenHeight] to the correct tex coord range [0, 1] by multiplying by OneOverScreenHeight. Then, we need to correct for the aspect ratio in xStep to keep it stepping whole pixels in x, so scale x by (OneOverScreenWidth/OneOverScreenHeight). Alternatively you could of course scale by ScreenWidth first, then correct the y coord the other way if you wanted.

      Hope that helps.

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

(required)

(required)