The interrupt enable register cannot be used to temporarily disable
interrupts, instead use the interrupt mask register.
Note that this change means that a pile of buffers will be left stuck on the
chip as the final interrupts will not be recognized to come and drain things.
Recording the tail pointer in a local variable improves performance, but if
someone messes up and fails to reload at the right time, the driver will
write commands to the wrong part of the ring and scramble execution badly.
This change (available by setting I915_RING_VALIDATE to 1) checks to make
sure the cached tail pointer matches the hardware tail pointer at each ring
buffer addition, calling BUG_ON when that's not true.
There are now 3 lists. Active is buffers currently in the ringbuffer.
Flushing is not in the ringbuffer, but needs a flush before unbinding.
Inactive is as before. This prevents object_free → unbind →
wait_rendering → object_reference and a kernel oops about weird refcounting.
This also avoids an synchronous extra flush and wait when freeing a buffer
which had a write_domain set (such as a temporary rendered to and then from
using the 2d engine). It will sit around on the flushing list until the
appropriate flush gets emitted, or we need the GTT space for another
operation.
This lets us get some qualities we desire, such as using the full 32-bit
range (except zero), avoiding DRM_WAIT_ON, and a 1:1 mapping of active
sequence numbers to request structs, which will be used soon for throttling
and interrupt-driven list cleanup.
Additionally, a boolean active field is added to indicate which list an
object is on, rather than smashing last_rendering_cookie to 0 to show
inactive. This will help with flush-reduction later on, and makes the code
clearer.
pread and pwrite must update the memory domains to ensure consistency with
the GPU. At some point, it should be possible to avoid clflush through this
path, but that isn't working for me.
Now, the LRU list has objects that are completely done rendering and ready
to kick out, while the execution list has things with active rendering,
which have associated cookies and reference counts on them.
Domain information is about buffer relationships, not buffer contents. That
means a relocation contains the domain information as it knows how the
source buffer references the target buffer.
This also adds the set_domain ioctl so that user space can move buffers to
the cpu domain.
If objects on the lru aren't ref counted, they'll get pulled from the gtt as
soon as they are freed. This change does cause objects to get stuck in the
gtt until they're forced out by new requests. The lru should get cleaned
when the irq occurs.
When batch buffers are executing, the ring may be stuck for a long time.
Monitor the ACTHD pointer which will show if the execution engine is
actually hung.
Names are just another unique integer set (from another idr object).
Names are removed when the user refernces (handles) are all destroyed --
this required that handles for objects be counted separately from
internal kernel references (so that we can tell when the handles are all
gone).
mixed 32/64 bit systems need 'special' help for ioctl where the user-space
and kernel-space datatypes differ. Fixing the datatypes to be the same size,
and align the same way for both 32 and 64-bit ppc and x86 environments will
elimiante the need to have magic 32/64-bit ioctl translation code.
This is possibly temporary. I can trigger an unending IRQ storm on G8x
in some circumstances, and have no idea how to handle that particular PFIFO
exception correctly yet.
I swore I'd actually do this properly and not go the horrible route
we did with nv4x, but I won't get around to it just yet with so many
*actually* interesting things to do first.. One day.
Since someone already added nv86, why not!
Enum can be of pretty much any size since C leaves the choice of size up to the implementation. So avoid using it in new interfaces like the vblank pre- & post-modeset ioctl. Thanks to hch for spotting this.
The batchbuffer submission paths were fixed to use the 965-specific command,
but the vblank tasklet was not. When the older version is sent, the 965 will
lock up.
This interface was defined completely wrong, however userspace has only
ever used 4 values from it (0x1, 0x2, 0x3 and 0x6), so fix the interface to do what userspace actually expected but define new defines for new users to use
it properly.
Previously, the R300_CMD_WAIT command would write the passed directly to the
hardware. However this is incorrect because the R300_WAIT_* values used are
internal interface values that do not map directly to the hardware.
The new function I have added translates the R300_WAIT_* values into appropriate
values for the hardware before writing the register.
Thanks to John Bridgman for pointing this out. :-)
More or less a workaround for issues on some chipsets where a context
switch results in critical data in PRAMIN being overwritten by the GPU.
The correct fix is known, but may take some time before it's a feasible
option.
This is the correct fix for the RS690 and hopefully the dma coherent work.
For now we limit everybody to a 32-bit DMA mask but it is possible for
RS690 to use a 40-bit DMA mask for the GART table itself,
and the PCIE cards can use 40-bits for the table entries.
Signed-off-by: Dave Airlie <airlied@redhat.com>
DMA command submission. It's worth remembering that all new bright ideas on how
to make this command reader work properly and according to docs
will probably fail :( Bring in some old code.
If we ever want to be able to use the 3D engine we have no choice. It
appears that the tiling setup (required for 3D on G8x) is in the page tables.
The immediate benefit of this change however is that it's now not possible
for a client to use the GPU to render over the top of important engine setup
tables, which also live in VRAM.
G8x VRAM size is limited to 512MiB at the moment, as we use a 1-1 mapping
of real vram pages to their offset within the start of a channel's VRAM
DMA object and only populate a single PDE for VRAM use.
Conflicts:
linux-core/drm_compat.c
linux-core/drm_compat.h
linux-core/drm_ttm.c
shared-core/i915_dma.c
Bump driver minor to 13 due to introduction of new
relocation type.
My 965GM gets interrupts stuck when using the old PIPE_VBLANK interrupt.
Switch to the PIPE_EVENT interrupt mechanism, and set the PIPE*STAT
registers to use START_VBLANK on 965 and VBLANK on previous chips.
Will hopefully work a bit better than previous code, which depended on
knowing the channel's most recent PUT value. Some chips always return
0 on reading these regs, and currently userspace is the only other entity
which knows the value.
Texture uploads could hit the blitter coordinate limit, adjust the texture
offset when uploading the pieces. Make sure to check the end address of the
upload too.
Make sure we have enough room for all the GR registers or we'll end up
clobbering the AR index register (which should actually be harmless
unless the BIOS is making an assumption about it).
On resume, if the interrupt state isn't restored correctly, we may end
up with a flood of unexpected or ill-timed interrupts, which could cause
the kernel to disable the interrupt or vblank events to happen at the
wrong time. So save/restore them properly.