Record the last execbuffer sequence for each client.
Record that sequence in the throttle ioctl as the 'throttle sequence'.
Wait for the last throttle sequence in the throttle ioctl.
When i915_wait_request clears object from the active list, it may end up
freeing them and not moving them to the inactive list. This ends up
unbinding objects from the GTT without there ever being new objects visible
to i915_gem_evict_something on the inactive list. As the only success
condition required the presence of objects on the inactive list, this would
falsely assume that no GTT space had been made available, and end up
returning -ENOMEM to the application.
We want request retirement to occur about once a second when the request
queue is non-empty. This was done with a timer that queued a work_struct,
using a delayed_work instead makes a lot more sense.
This is the create (may want location flags), pread/pwrite/mmap
(performance tuning hints), and set_domain (will 32 bits be enough for
everyone?) ioctls. Left in the generic set are just flink/open/close.
The 2D driver must be updated for this change, and API but not ABI is broken
for 3D. The driver version is bumped to mark this.
This requires that the X Server use the execbuf interface for buffer
submission, as it no longer has direct access to the ring. This is
therefore a flag day for the gem interface.
This also adds enter/leavevt ioctls for use by the X Server. These would
get stubbed out in a modesetting implementation, but are required while
in an environment where the device's state is only managed by the DRM while
X has the VT.
Without the user IRQ running constantly, there's no wakeup when the ring
empties to go retire requests and free buffers. Use a 1 second timer to make
that happen more often.
Instead of throttling and execbuffer time, have the application ask to
throttle explicitly. This allows the throttle to happen less often, and
without holding the DRM lock.
set_domain can block waiting for rendering to complete. If that process is
interrupted by a signal, it can return -EINTR. Catch this error in all
callers and correctly deal with the result.
Object domain transfer can involve adding flush ops to the request queue,
and so the DRM lock must be held to avoid having the X server smash pointers
badly.
The interrupt enable register cannot be used to temporarily disable
interrupts, instead use the interrupt mask register.
Note that this change means that a pile of buffers will be left stuck on the
chip as the final interrupts will not be recognized to come and drain things.
When reading from multiple domains, allow each cache to continue
to hold data until writes occur somewhere. This is done by
first leaving the read_domains alone at bind time (presumably the CPU read
cache contains valid data still) and then in set_domain, if no write_domain
is specified, the new read domains are simply merged into the existing read
domains.
A huge comment was added above set_domain to explain how things are
expected to work.
Newly allocated objects need to be in the CPU domain as they've just been
cleared by the CPU. Also, unmapping objects from the GTT needs to put them
into the CPU domain, both to flush rendering as well as to ensure that any
paging action gets flushed before we remap to the GTT.
Commands in the ring are parsed and started when the head pointer passes by
them, but they are not necessarily finished until a MI_FLUSH happens. This
patch inserts a flush after the execbuffer (the only place a flush wasn't
already happening).
There are now 3 lists. Active is buffers currently in the ringbuffer.
Flushing is not in the ringbuffer, but needs a flush before unbinding.
Inactive is as before. This prevents object_free → unbind →
wait_rendering → object_reference and a kernel oops about weird refcounting.
This also avoids an synchronous extra flush and wait when freeing a buffer
which had a write_domain set (such as a temporary rendered to and then from
using the 2d engine). It will sit around on the flushing list until the
appropriate flush gets emitted, or we need the GTT space for another
operation.
This lets us get some qualities we desire, such as using the full 32-bit
range (except zero), avoiding DRM_WAIT_ON, and a 1:1 mapping of active
sequence numbers to request structs, which will be used soon for throttling
and interrupt-driven list cleanup.
Additionally, a boolean active field is added to indicate which list an
object is on, rather than smashing last_rendering_cookie to 0 to show
inactive. This will help with flush-reduction later on, and makes the code
clearer.
No need to fill the ring that much; wait for it to become nearly empty
before adding the execbuffer request. A better fix will involve scheduling
ring insertion in the irq handler.
pread and pwrite must update the memory domains to ensure consistency with
the GPU. At some point, it should be possible to avoid clflush through this
path, but that isn't working for me.
The exec list contains all objects, in order of use. The lru list contains
only unpinned objects ready to be evicted. This required two changes -- the
first was to not migrate pinned objects from exec to lru, the second was to
search for the first unpinned object in the exec list when doing eviction.
Now, the LRU list has objects that are completely done rendering and ready
to kick out, while the execution list has things with active rendering,
which have associated cookies and reference counts on them.