====1.Rasterization
a) The depth complexity, d:
The number of triangles that overlap with a pixel
(even though each triangle need not write to the pixel)
The overdraw is the number of times you actually write
to a pixel.
Statistical model of the overdraw, o, is:
o(d)=1 + 1/2 + 1/3 + ... + 1/d

b) You must make sure that there are no cracks and that no
pixels are written to more than once. This is done using
McCool's code snippet. This can be found in Figure 2.4 in
the course notes. Basically,
bool inside(x,y)
    if(e(x,y)>0 return true;
    if(e(x,y)<0 return false;
    if(a>0) return true;
    if(a<0) return false;
    if(b>0) return true;
    return false;
end;
In addition, you need to take care of the case where a vertex
coincides with the sampling point of the pixel. This can
be done by using fixed-point coordinates for the vertices,
and offsetting them so that this never happen. Or use
the solution in Figure 2.8.

c) You should not project all four points on the normal,
and you should not evaluate the edge functions for all corners of
the tile. There are smarter, less expensive solution.
So, you can look at the signs of the normal components
of the edge functions in order to determine only _one_ tile
corner (called p) to insert into the edge functions.
The normal is (a,b) or sometimes called (n_x, n_y).
If(n_x<0 && n_y<0) then use the lower left corner of the tile as p.
and so on (see Equation 2.10 in course notes)
if(e(p)<0) then we have no overlap.
Do this for all three edge functions.
You also need to check whether the bounding box of the triangle
overlaps with the tile.

d) Two points: (ax,ay) to (bx,by). The line equation
is on the form: c*X+d*Y+e=0 (avoiding a and b as parameters
as the points are called a and b).

The line equation can be written as:

X(t) = ax + (bx-ax)*t ,
Y(t) = ay + (by-ay)*t , where t is in [0,1]

Rewrite as: 
(X - ax)/(bx-ax) = t
(Y - ay)/(by-ay) = t

Set them equal gives:
(X-ax)*(by-ay)-(Y-ay)*(bx-ax) = 0

This gives c, d, and e.

Another way of doing it:
All points that lie exactly on the line must
fullfil c*X+d*Y+e=0, and this means also that
if you compute the direction from a point on
the line to the starting point (ax,ay), you get
a direction: (X,Y)-(ax,ay)
This direction must be perpendicular to the normal
of the line.
The normal is simply (bx,by)-(ax,ay) = (bx-ax,by-ay)
rotated 90 degrees, which gives us the normal as:
N=(-(by-ay),bx-ax)
So, the line equation then becomes:
N*((X,Y)-(ax,ay)) = 0
This again gives c,d,e.

e) A point in a triangle divides the triangle into
three areas, A0,A1,A2, with the entire triangle area A=A0+A1+A2
The barycentric coordinates are: (A1,A2,A0)/A
See slide 36 and 37 in Lecture L4.pdf.

f) See slide 18, L4.pdf.


=====2. Texture
a) When texture minification occurs, you access texels with
large distances inbetween, and hence there is no use for
texture caching. Mipmapping on the other hand, accesses
one or more levels in the mipmapping hierarchy, and these
levels are selected such that you visit neighboring texels
when walking to the next pixel on the triangle.

====3. Buffer Compression
a) Texture compression is done on read-only data (images),
and it does not matter much if it takes a few minutes to
compress an image. Buffer compression operates on 
the color buffer, depth buffer, or the stencil buffer, and
these are read and write buffers. 
Texture compression is almost always lossy. This does not
work without major modifications for buffers, and hence
if compression does not work, we need a fallback so
that the data can be sent uncompressed. Whether a tile
is compressed or not is indicated by a tile table, where
a few bits indicates how it is compressed, cleared, or
not compressed.
Buffer compression does not save memory, only bandwidth.
Texture compression saves both.

b) First of all, we need to have a fall back so that we can handle
uncompressed blocks, e.g., if a tile cannot be compressed with
the available algorithms, then the block has to be sent in
uncompressed form over the bus. In addition, you need to store
a few (e.g., 1 or 2) bits per tile in a cache or on chip.
These bits indicate with which algorithm the tile has been
compressed. So, for example, using two bits, you can have the
following:
00 = uncompressed fallback
01 = tile is cleared
10 = compressed using algorithm X
11 = compressed using algorithm Y

c) Depth varies linearly over a triangle, and so can be
predicted easily when a tile only overlaps with one or
two triangles. The color buffer is much more complex because
texturing can make the colors appear random, and this is
of course harder to compress.

d) Here is one idea: compress each color channel (R, G, and B)
separately. For a 4x4 tile, store the component
in three of the corners. This costs 3x8 bits=24 bits. The
rest of the values are predicted using the plane equation of
these three values. To handle more tiles, we add a 3 bit
correction term per pixel (4x4-3=13 pixels left), which costs
13*3=39 bits. In total, this costs 24+39=63 bits.
For R,G, and B, this sums up to : 63*3=189 bits, which is about
12 bits per pixel. That is the compression ratio is 50%.

e) To make a 4-bit number into an 8-bit number, we need to make
sure that the largest 4-bit number maps to the largest 8-bit number.
Thus, we need to multiply with 255/15=17.
Assume the 4-bit number is A. Then we need to compute:
17*A= 16*A + A = (A<<4) + A = (A<<4) or A
where "or" is logical or.

====4. Memory Bandwidth
a) Assume 4 bytes for color, depth, texture
 d=6 gives o=2.5 (approximately)
 B = d*Z_r + o * (Z_w + C_w + 2* m* 4 * T_r)
 cost(Z) = 6 + 2.5 = 8.5
 cost(C) = 2.5
 cost(T) = 2.5 * 4 * 2* m = 20*0.25=5

====5. Culling
a) Zmin culling works with tiled rasterization algorithms, and
each tile has storage for the minimum of its z-values in the
depth buffer. Let's denote this value by: zmin
This must be stored in on-chip memory or accessed
through a cache. When rendering a triangle, the Z_max of
the triangle inside the tile must be estimated. If
Z_max < zmin
then we know that the rendered triangle is definitely in front
of all previously rendered triangles, and hence we do not need
to read from the z-buffer. This is the bandwidth that we can
save, i.e., only reads from the z-buffer.
Z_max can be estimated with:
1) max of the triangle vertices' depths,
   or
2) max of the triangle plane's depths at the tile corners,
  or
3) minimum of 1 and 2 (which is the best-performing solution).

b) You start an occlusion query, and then you render a number
of triangles, and the hardware counts how many fragments that
pass through the entire pipeline (often all writes to the buffers
are turned off). You can for example draw the triangles of
the bounding box of a complex character, and if no fragments pass
then the box is invisible and thus also the character.
This means that you do not need to render the character at all,
and you do not even need to send the character over the bus to
the graphics hardware.

c) Both algorithms can save only depth reads (assuming texturing
is done after the depth test), and therefore we look at the part
of the rasterization equation that deals with depth buffer reads
(and writes):
B_d = (d-o)*Z_r + o*(Z_r + Z_w)

(d-o)*Z_r are fragments that only read, and hence are occluded,
and these can be avoided by Zmax-culling. So when:
d-o > o  <==> d> 2*o
is fulfilled, we know that Zmax works better. When
d=4 we have (see 1c) o=2 (approximately), and this is when
the break even is reached.
So, for d>=4 we would want to use Zmax, and Zmin before that
(in the beginning of rendering a frame).
Somehow, we need to "count" the depth complexity of each tile,
or have some estimate of the average depth complexity of
the entire screen. A counter per tile, or one for the screen
need to be incremented each time we render a triangle.

====6. Anti-Aliasing
a) Ideally, we would like to have an analytical solution,
where we compute the areas covered by the different triangles,
and which color each areas has. Since this includes texturing,
and since many triangles can overlap a pixel, this becomes
a very difficult problem.

b) texture, screen-space, shader aliasing, time

c) put the sample points at different locations vertically,
preferably with the same distance between (vertical distance)

====7. Architecture
a) Unified shaders: by using the same shader instructions for
both vertex shaders and pixel shaders, we can built one type
of shader unit that executes these instructions. The major idea
then is to use the units for either vertices or pixels depending
on where the bottleneck is. That is we get load balancing in
the pipeline and better exploit the computational units in the
hardware.