This lets us get some qualities we desire, such as using the full 32-bit
range (except zero), avoiding DRM_WAIT_ON, and a 1:1 mapping of active
sequence numbers to request structs, which will be used soon for throttling
and interrupt-driven list cleanup.
Additionally, a boolean active field is added to indicate which list an
object is on, rather than smashing last_rendering_cookie to 0 to show
inactive. This will help with flush-reduction later on, and makes the code
clearer.
pread and pwrite must update the memory domains to ensure consistency with
the GPU. At some point, it should be possible to avoid clflush through this
path, but that isn't working for me.
Now, the LRU list has objects that are completely done rendering and ready
to kick out, while the execution list has things with active rendering,
which have associated cookies and reference counts on them.
Domain information is about buffer relationships, not buffer contents. That
means a relocation contains the domain information as it knows how the
source buffer references the target buffer.
This also adds the set_domain ioctl so that user space can move buffers to
the cpu domain.
If objects on the lru aren't ref counted, they'll get pulled from the gtt as
soon as they are freed. This change does cause objects to get stuck in the
gtt until they're forced out by new requests. The lru should get cleaned
when the irq occurs.
When batch buffers are executing, the ring may be stuck for a long time.
Monitor the ACTHD pointer which will show if the execution engine is
actually hung.
Names are just another unique integer set (from another idr object).
Names are removed when the user refernces (handles) are all destroyed --
this required that handles for objects be counted separately from
internal kernel references (so that we can tell when the handles are all
gone).
mixed 32/64 bit systems need 'special' help for ioctl where the user-space
and kernel-space datatypes differ. Fixing the datatypes to be the same size,
and align the same way for both 32 and 64-bit ppc and x86 environments will
elimiante the need to have magic 32/64-bit ioctl translation code.
The batchbuffer submission paths were fixed to use the 965-specific command,
but the vblank tasklet was not. When the older version is sent, the 965 will
lock up.