In the short-circuit code for the breadcrumb already being new enough, we
need to update the sarea_priv copy of the breadcrumb just as if we had
waited. Otherwise userland error checking will notice that we returned
too early based on its wrong information, and call wait_irq again (leading
to spinning until someone else comes along and updates the sarea_priv).
This bug was hidden when we had interrupt masking disabled, such as in
master, since the interrupt handler would update sarea_priv.
This was insufficient once we started masking interrupts to only when someone
was waiting for them (and would thus retire requests themselves). It was
replaced by the retire_timer.
This is the create (may want location flags), pread/pwrite/mmap
(performance tuning hints), and set_domain (will 32 bits be enough for
everyone?) ioctls. Left in the generic set are just flink/open/close.
The 2D driver must be updated for this change, and API but not ABI is broken
for 3D. The driver version is bumped to mark this.
This requires that the X Server use the execbuf interface for buffer
submission, as it no longer has direct access to the ring. This is
therefore a flag day for the gem interface.
This also adds enter/leavevt ioctls for use by the X Server. These would
get stubbed out in a modesetting implementation, but are required while
in an environment where the device's state is only managed by the DRM while
X has the VT.
Without the user IRQ running constantly, there's no wakeup when the ring
empties to go retire requests and free buffers. Use a 1 second timer to make
that happen more often.
Instead of throttling and execbuffer time, have the application ask to
throttle explicitly. This allows the throttle to happen less often, and
without holding the DRM lock.
The interrupt enable register cannot be used to temporarily disable
interrupts, instead use the interrupt mask register.
Note that this change means that a pile of buffers will be left stuck on the
chip as the final interrupts will not be recognized to come and drain things.
Recording the tail pointer in a local variable improves performance, but if
someone messes up and fails to reload at the right time, the driver will
write commands to the wrong part of the ring and scramble execution badly.
This change (available by setting I915_RING_VALIDATE to 1) checks to make
sure the cached tail pointer matches the hardware tail pointer at each ring
buffer addition, calling BUG_ON when that's not true.
There are now 3 lists. Active is buffers currently in the ringbuffer.
Flushing is not in the ringbuffer, but needs a flush before unbinding.
Inactive is as before. This prevents object_free → unbind →
wait_rendering → object_reference and a kernel oops about weird refcounting.
This also avoids an synchronous extra flush and wait when freeing a buffer
which had a write_domain set (such as a temporary rendered to and then from
using the 2d engine). It will sit around on the flushing list until the
appropriate flush gets emitted, or we need the GTT space for another
operation.
This lets us get some qualities we desire, such as using the full 32-bit
range (except zero), avoiding DRM_WAIT_ON, and a 1:1 mapping of active
sequence numbers to request structs, which will be used soon for throttling
and interrupt-driven list cleanup.
Additionally, a boolean active field is added to indicate which list an
object is on, rather than smashing last_rendering_cookie to 0 to show
inactive. This will help with flush-reduction later on, and makes the code
clearer.
pread and pwrite must update the memory domains to ensure consistency with
the GPU. At some point, it should be possible to avoid clflush through this
path, but that isn't working for me.