====1.Rasterization a) The depth complexity, d: The number of triangles that overlap with a pixel (even though each triangle need not write to the pixel) The overdraw is the number of times you actually write to a pixel. Statistical model of the overdraw, o, is: o(d)=1 + 1/2 + 1/3 + ... + 1/d b) You must make sure that there are no cracks and that no pixels are written to more than once. This is done using McCool's code snippet. This can be found in Figure 2.4 in the course notes. Basically, bool inside(x,y) if(e(x,y)>0 return true; if(e(x,y)<0 return false; if(a>0) return true; if(a<0) return false; if(b>0) return true; return false; end; In addition, you need to take care of the case where a vertex coincides with the sampling point of the pixel. This can be done by using fixed-point coordinates for the vertices, and offsetting them so that this never happen. Or use the solution in Figure 2.8. c) You should not project all four points on the normal, and you should not evaluate the edge functions for all corners of the tile. There are smarter, less expensive solution. So, you can look at the signs of the normal components of the edge functions in order to determine only _one_ tile corner (called p) to insert into the edge functions. The normal is (a,b) or sometimes called (n_x, n_y). If(n_x<0 && n_y<0) then use the lower left corner of the tile as p. and so on (see Equation 2.10 in course notes) if(e(p)<0) then we have no overlap. Do this for all three edge functions. You also need to check whether the bounding box of the triangle overlaps with the tile. d) Two points: (ax,ay) to (bx,by). The line equation is on the form: c*X+d*Y+e=0 (avoiding a and b as parameters as the points are called a and b). The line equation can be written as: X(t) = ax + (bx-ax)*t , Y(t) = ay + (by-ay)*t , where t is in [0,1] Rewrite as: (X - ax)/(bx-ax) = t (Y - ay)/(by-ay) = t Set them equal gives: (X-ax)*(by-ay)-(Y-ay)*(bx-ax) = 0 This gives c, d, and e. Another way of doing it: All points that lie exactly on the line must fullfil c*X+d*Y+e=0, and this means also that if you compute the direction from a point on the line to the starting point (ax,ay), you get a direction: (X,Y)-(ax,ay) This direction must be perpendicular to the normal of the line. The normal is simply (bx,by)-(ax,ay) = (bx-ax,by-ay) rotated 90 degrees, which gives us the normal as: N=(-(by-ay),bx-ax) So, the line equation then becomes: N*((X,Y)-(ax,ay)) = 0 This again gives c,d,e. e) A point in a triangle divides the triangle into three areas, A0,A1,A2, with the entire triangle area A=A0+A1+A2 The barycentric coordinates are: (A1,A2,A0)/A See slide 36 and 37 in Lecture L4.pdf. f) See slide 18, L4.pdf. =====2. Texture a) When texture minification occurs, you access texels with large distances inbetween, and hence there is no use for texture caching. Mipmapping on the other hand, accesses one or more levels in the mipmapping hierarchy, and these levels are selected such that you visit neighboring texels when walking to the next pixel on the triangle. ====3. Buffer Compression a) Texture compression is done on read-only data (images), and it does not matter much if it takes a few minutes to compress an image. Buffer compression operates on the color buffer, depth buffer, or the stencil buffer, and these are read and write buffers. Texture compression is almost always lossy. This does not work without major modifications for buffers, and hence if compression does not work, we need a fallback so that the data can be sent uncompressed. Whether a tile is compressed or not is indicated by a tile table, where a few bits indicates how it is compressed, cleared, or not compressed. Buffer compression does not save memory, only bandwidth. Texture compression saves both. b) First of all, we need to have a fall back so that we can handle uncompressed blocks, e.g., if a tile cannot be compressed with the available algorithms, then the block has to be sent in uncompressed form over the bus. In addition, you need to store a few (e.g., 1 or 2) bits per tile in a cache or on chip. These bits indicate with which algorithm the tile has been compressed. So, for example, using two bits, you can have the following: 00 = uncompressed fallback 01 = tile is cleared 10 = compressed using algorithm X 11 = compressed using algorithm Y c) Depth varies linearly over a triangle, and so can be predicted easily when a tile only overlaps with one or two triangles. The color buffer is much more complex because texturing can make the colors appear random, and this is of course harder to compress. d) Here is one idea: compress each color channel (R, G, and B) separately. For a 4x4 tile, store the component in three of the corners. This costs 3x8 bits=24 bits. The rest of the values are predicted using the plane equation of these three values. To handle more tiles, we add a 3 bit correction term per pixel (4x4-3=13 pixels left), which costs 13*3=39 bits. In total, this costs 24+39=63 bits. For R,G, and B, this sums up to : 63*3=189 bits, which is about 12 bits per pixel. That is the compression ratio is 50%. e) To make a 4-bit number into an 8-bit number, we need to make sure that the largest 4-bit number maps to the largest 8-bit number. Thus, we need to multiply with 255/15=17. Assume the 4-bit number is A. Then we need to compute: 17*A= 16*A + A = (A<<4) + A = (A<<4) or A where "or" is logical or. ====4. Memory Bandwidth a) Assume 4 bytes for color, depth, texture d=6 gives o=2.5 (approximately) B = d*Z_r + o * (Z_w + C_w + 2* m* 4 * T_r) cost(Z) = 6 + 2.5 = 8.5 cost(C) = 2.5 cost(T) = 2.5 * 4 * 2* m = 20*0.25=5 ====5. Culling a) Zmin culling works with tiled rasterization algorithms, and each tile has storage for the minimum of its z-values in the depth buffer. Let's denote this value by: zmin This must be stored in on-chip memory or accessed through a cache. When rendering a triangle, the Z_max of the triangle inside the tile must be estimated. If Z_max < zmin then we know that the rendered triangle is definitely in front of all previously rendered triangles, and hence we do not need to read from the z-buffer. This is the bandwidth that we can save, i.e., only reads from the z-buffer. Z_max can be estimated with: 1) max of the triangle vertices' depths, or 2) max of the triangle plane's depths at the tile corners, or 3) minimum of 1 and 2 (which is the best-performing solution). b) You start an occlusion query, and then you render a number of triangles, and the hardware counts how many fragments that pass through the entire pipeline (often all writes to the buffers are turned off). You can for example draw the triangles of the bounding box of a complex character, and if no fragments pass then the box is invisible and thus also the character. This means that you do not need to render the character at all, and you do not even need to send the character over the bus to the graphics hardware. c) Both algorithms can save only depth reads (assuming texturing is done after the depth test), and therefore we look at the part of the rasterization equation that deals with depth buffer reads (and writes): B_d = (d-o)*Z_r + o*(Z_r + Z_w) (d-o)*Z_r are fragments that only read, and hence are occluded, and these can be avoided by Zmax-culling. So when: d-o > o <==> d> 2*o is fulfilled, we know that Zmax works better. When d=4 we have (see 1c) o=2 (approximately), and this is when the break even is reached. So, for d>=4 we would want to use Zmax, and Zmin before that (in the beginning of rendering a frame). Somehow, we need to "count" the depth complexity of each tile, or have some estimate of the average depth complexity of the entire screen. A counter per tile, or one for the screen need to be incremented each time we render a triangle. ====6. Anti-Aliasing a) Ideally, we would like to have an analytical solution, where we compute the areas covered by the different triangles, and which color each areas has. Since this includes texturing, and since many triangles can overlap a pixel, this becomes a very difficult problem. b) texture, screen-space, shader aliasing, time c) put the sample points at different locations vertically, preferably with the same distance between (vertical distance) ====7. Architecture a) Unified shaders: by using the same shader instructions for both vertex shaders and pixel shaders, we can built one type of shader unit that executes these instructions. The major idea then is to use the units for either vertices or pixels depending on where the bottleneck is. That is we get load balancing in the pipeline and better exploit the computational units in the hardware.