Tiled Rendering Showdown ForwardPlusPlus vs Deferred Rendering

最新推荐文章于 2026-03-28 03:31:51 发布

原创最新推荐文章于 2026-03-28 03:31:51 发布 · 865 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#RenderPipeline #Forward+

GDC2013 专栏收录该内容

6 篇文章

订阅专栏

本文探讨了Forward+渲染技术相较于传统GPUPro Forward+的优势与应用场景，特别是对于alpha混合、阴影投射光的处理，以及在不同光照数量下的性能表现。还对比了Forward+与Deferred渲染在透明度、多重采样抗锯齿(MSAA)等方面的表现。

# brief
- 比起GPUPro上讲Forward+的文章，多介绍了alpha blend和shadow casting light的处理
- 如果要用MSAA的话只能forward，但是deferred也是可以用SMAA TXAA的。。。。
- 光源1k一下还是deferred有优势
- 处理transparent这些，用hybrid，大部分用deferred，搞不定的用forward

# 复习Forward+
- 被用在Dirt: Showdown中
- forward plus algorithm
- depth pre-pass(mondatory, 因为需要知道tile的depth range)
- tile light culling
- forward shading

# forward++
- alpha blend geometry
- 开一个新的Zbuffer
- render blended geometry(depth only)
- compute new set of tiled light lists（minZ是当前累加出来的，maxZ是之前opaque阶段得到的）
- render blended scene using new light list(因为opaque maxZ表示了opaque阶段最后面的，如果blended阶段有超过这个距离的那其实是一定被opaque挡掉了)
- particle最好是另外搞：
- 每个emitter一个光源
- 用vertex lighting（为啥？）
- two sided lighting
- 比较蠢的搞法是对front faces来一遍、对back faces再来
- preferred approach是遍历一次，accumulate front and back lighting
- shadow casting light
- render shadow maps into 2D texture atlas
- store shadowmap index into alpha channel of light color
- use dynamic branch to calculate shadow term(那个shader的复杂度。。。。)
- 根据lightdir找相应的转换矩阵
- 由于是把本应该是cubemap的shadowmap搞到了一个atlas上，还要处理接缝，所以要乘啊加啊一些bias
- shadowmap更新有很多trick
- geometry change
- light position change
- spread cost to multiple frame
- 可以把shadowmap index pack到light的某个地方存起来

# GI using VPL
- render out RSM
- 应该就是从光源视角去渲染到RT，获得一个Gbuffer（normal diffuse）
- compute shader generate VPL
- generate VPL per 2x2 texel block
- VPL brightness based on distance and strength of source light
- VPL color based on RSM diffuse
- 需要存normal（backfacing light要去掉）
- store into structured buffer using atomic increment
- 后面要用需要CopyStructureCount来copy internal counter to constant buffer，如果要debug啥的还可以通过staging result读回CPU
- VPL用的时候就是standard(simple) dot(N,L)

# depth discontinuities
- problem：就是场景纵深超大，一个frustum有很多光源
- solution
- method1: 把tile纵向切2个
- method2：2.5D。纵向切32个，用mask
- 注意：需要atomic OR
- 需要注意的是这个办法最终还是每个tile一个light list
- 如果的确是纵深上有很多物体，有很多light相交，可能确实还是method1好。另外light数不超过1k的时候2.5D还不如halfZ（split light list）

# examples
- BattleField3是TiledDeferred

# peformance comparision
- forward+ need to submit geometry twice, and small triangles problem more pronounced
- testscene
- 4k lights
- 1M tiny triangles
- detail
- 3 rgba gbuffer情况下：forward完败。forward+ 10% loose(在任何light count下)
- 4 rgba gbuffer情况下：光源少（1k以下)的情况下forward+占优
- 5 rgba gbuffer情况下：deferred完败
- 开了MSAA 4x后deferred就完全不行了

# summary
- non-MSAA
- condition favors F+ when
- triangle count under control,有LOD，tesellation
- 如果用Gbuffer会有超过4个32bits
- light数少于1.5k
- transparency
- gbuffer放不下东西时
- MSAA
- 只能用forward
- 但是咱不是有各种AA么。。。谁用MSAA

# QA
- clustered shading试过没:看了，没实现，不好搞，回头试试
- tilesize：
- 小tile可以cull得更好
- 大tile cull得快，但是culling效果不好，16x16挺好
- VPL count：咱是用了非常多6K的VPL count