Record the last execbuffer sequence for each client.
Record that sequence in the throttle ioctl as the 'throttle sequence'.
Wait for the last throttle sequence in the throttle ioctl.
When i915_wait_request clears object from the active list, it may end up
freeing them and not moving them to the inactive list. This ends up
unbinding objects from the GTT without there ever being new objects visible
to i915_gem_evict_something on the inactive list. As the only success
condition required the presence of objects on the inactive list, this would
falsely assume that no GTT space had been made available, and end up
returning -ENOMEM to the application.
We want request retirement to occur about once a second when the request
queue is non-empty. This was done with a timer that queued a work_struct,
using a delayed_work instead makes a lot more sense.
Use GEM for ring buffer setup and framebuffer allocation. This means reworking
the hardware status page stuff a bit (just use the basic range allocator for
vram for now) and #ifdef'ing out the TTM & DRI2 code. Works well enough to
load/unload several times and display fbcon on my T61 (though there's still
some unexplained console corruption).
This is the create (may want location flags), pread/pwrite/mmap
(performance tuning hints), and set_domain (will 32 bits be enough for
everyone?) ioctls. Left in the generic set are just flink/open/close.
The 2D driver must be updated for this change, and API but not ABI is broken
for 3D. The driver version is bumped to mark this.
Use new GEM based ring buffer initialization. Still need to init GEM & use it
for framebuffer allocation etc.
Conflicts:
shared-core/i915_dma.c
shared-core/i915_drv.h
This requires that the X Server use the execbuf interface for buffer
submission, as it no longer has direct access to the ring. This is
therefore a flag day for the gem interface.
This also adds enter/leavevt ioctls for use by the X Server. These would
get stubbed out in a modesetting implementation, but are required while
in an environment where the device's state is only managed by the DRM while
X has the VT.
Port over EDID quirks from X.Org so we can handle more monitors. This meant
adding size info to the drm_display_mode struct, but other than that the
changes were isolated to the DRM EDID handling code (as they should be).
Without the user IRQ running constantly, there's no wakeup when the ring
empties to go retire requests and free buffers. Use a 1 second timer to make
that happen more often.
Instead of throttling and execbuffer time, have the application ask to
throttle explicitly. This allows the throttle to happen less often, and
without holding the DRM lock.
A check in drm_sysfs_connector_remove was supposed to allow it to be called
even with unregistered objects, to make cleanup paths a little simpler.
However, device_is_regsitered didn't always seem to return what we thought it
would, so we'd sometimes end up leaving objects lying around rather than
unregistering them.
Fix this situation up by requiring devices to be registered before being
removed. Any problems resulting from this change should be easier to track
down than the alternative (which is leaving kobjects registered after unload).
We need to initialize the edid_blob_ptr to NULL when we init a connector,
otherwise drm_mode_connector_update_edid_property may think there's a valid
EDID lying around and try to destroy it, causing a crash.
Idea being if you want to add new crtc/output/encoder dynamically later,
you just increase the generation counter and userspace should re-read
all the resources
Without kernel modesetting, this requires cooperation of the userspace
modesetting driver. We may have to leave the vblank interrupt enabled otherwise
to avoid problems.
Only compensate when the driver counter actually appears to have moved
backwards.
The compensation deltas need to be incremental instead of absolute; drop the
vblank_offset field and just use atomic_sub().
Turns out the radeon driver is affected by the same problem that prompted i915
to revert to less useful counter flipping at the end of the vblank interval. In
the long term, we can hopefully implement more reliable methods to achieve
counter flipping at the beginning of vblank, but otherwise this should be an
acceptable workaround.
This reverts commit 6671ad1917.
The vblank ioctl needs to update the userspace parameters when interrupted by
a signal, which was prevented by this. Let's see if this breaks other ioctls...
set_domain can block waiting for rendering to complete. If that process is
interrupted by a signal, it can return -EINTR. Catch this error in all
callers and correctly deal with the result.
This creates a default group attached to the legacy drm minor nodes.
It covers all the objects in the set. make set resources only return
objects for this set. Need to fix up other functions to only work on
objects in their allowed set.
Okay we have crtc, encoder and connectors.
No more outputs exposed beyond driver internals
I've broken intel tv connector stuff.
Really for TV we should have one TV connector, with a sub property for the
type of signal been driven over it
Use subclassing from the drivers to allocate the objects. This saves
two objects being allocated for each crtc/output and generally makes
exit paths cleaner.
This splits a lot of the core modesetting code out into a file of
helper functions, that are only called from themselves and/or the driver.
The driver gets called into more often or can call these functions from itself
if it is a helper using driver.
I've broken framebuffer resize doing this but I didn't like the API for that
in any case.
Object domain transfer can involve adding flush ops to the request queue,
and so the DRM lock must be held to avoid having the X server smash pointers
badly.
The interrupt enable register cannot be used to temporarily disable
interrupts, instead use the interrupt mask register.
Note that this change means that a pile of buffers will be left stuck on the
chip as the final interrupts will not be recognized to come and drain things.
Add code to get panel modes from the VBIOS if present and check whether certain
outputs exist. Should make our display detection code a little more robust.
When reading from multiple domains, allow each cache to continue
to hold data until writes occur somewhere. This is done by
first leaving the read_domains alone at bind time (presumably the CPU read
cache contains valid data still) and then in set_domain, if no write_domain
is specified, the new read domains are simply merged into the existing read
domains.
A huge comment was added above set_domain to explain how things are
expected to work.
Newly allocated objects need to be in the CPU domain as they've just been
cleared by the CPU. Also, unmapping objects from the GTT needs to put them
into the CPU domain, both to flush rendering as well as to ensure that any
paging action gets flushed before we remap to the GTT.
Commands in the ring are parsed and started when the head pointer passes by
them, but they are not necessarily finished until a MI_FLUSH happens. This
patch inserts a flush after the execbuffer (the only place a flush wasn't
already happening).
There are now 3 lists. Active is buffers currently in the ringbuffer.
Flushing is not in the ringbuffer, but needs a flush before unbinding.
Inactive is as before. This prevents object_free → unbind →
wait_rendering → object_reference and a kernel oops about weird refcounting.
This also avoids an synchronous extra flush and wait when freeing a buffer
which had a write_domain set (such as a temporary rendered to and then from
using the 2d engine). It will sit around on the flushing list until the
appropriate flush gets emitted, or we need the GTT space for another
operation.
The dummy read page will point to NULL if drm_bo_driver_init failed at
firstopen (modeset is not enabled), and will cause kernel oops at
subsequent drm_lastclose call, so be sure to check it.
This lets us get some qualities we desire, such as using the full 32-bit
range (except zero), avoiding DRM_WAIT_ON, and a 1:1 mapping of active
sequence numbers to request structs, which will be used soon for throttling
and interrupt-driven list cleanup.
Additionally, a boolean active field is added to indicate which list an
object is on, rather than smashing last_rendering_cookie to 0 to show
inactive. This will help with flush-reduction later on, and makes the code
clearer.
It would be nice if one day the DRM driver was the canonical source for
register definitions and core macros. To that end, this patch cleans
things up quite a bit, removing redundant definitions (some with
different names referring to the same register) and generally tidying up
the header file.
In order to avoid recursive ->detect->interrupt->detect->interrupt->...
we need to disable TV hotplug interrupts in
intel_tv.c:intel_tv_detect_type. We also need to enable the TV interrupt
detection and hotplug sequence properly in i915_irq.c.
No need to fill the ring that much; wait for it to become nearly empty
before adding the execbuffer request. A better fix will involve scheduling
ring insertion in the irq handler.
drm_crtc->fb may point to NULL, f.e X server will allocate a new fb
and assign it to the CRTC at startup, when X server exits, it will destroy
the allocated fb, making drm_crtc->fb points to NULL.
pread and pwrite must update the memory domains to ensure consistency with
the GPU. At some point, it should be possible to avoid clflush through this
path, but that isn't working for me.
The exec list contains all objects, in order of use. The lru list contains
only unpinned objects ready to be evicted. This required two changes -- the
first was to not migrate pinned objects from exec to lru, the second was to
search for the first unpinned object in the exec list when doing eviction.
Now, the LRU list has objects that are completely done rendering and ready
to kick out, while the execution list has things with active rendering,
which have associated cookies and reference counts on them.
Even if the TV encoder hasn't been fused off, we may not have a TV connector on
the platform. The BDB in the BIOS should give us this info in some cases.
Domain information is about buffer relationships, not buffer contents. That
means a relocation contains the domain information as it knows how the
source buffer references the target buffer.
This also adds the set_domain ioctl so that user space can move buffers to
the cpu domain.
This should already have been generally safe since we don't change contents
and put in new relocations between execbufs, so if we were writing in a new
relocation then we'd already waited rendering to complete when we moved
the target of the relocation. However, doing the right thing will be required
if we do buffer reuse.
The kernel has removed nopage so move the old nopage codepaths into a compat vm file and switch to using the fault paths.
nopfn is on its way out in the future also, so we should switch to using fault
for that path as well soon
If objects on the lru aren't ref counted, they'll get pulled from the gtt as
soon as they are freed. This change does cause objects to get stuck in the
gtt until they're forced out by new requests. The lru should get cleaned
when the irq occurs.
pages come back from find_or_create_page locked, but must not stay locked
for long. Unlock them immediately instead of waiting until we're done with
them to avoid deadlock when applications try to touch them.
Track named objects in /proc/dri/0/gem_names.
Track total object count in /proc/dri/0/gem_objects.
Initialize device gem data.
return -ENODEV for gem ioctls if the driver doesn't support gem.
Call unlock_page when unbinding from gtt.
Add numerous misssing calls to drm_gem_object_unreference.
Names are just another unique integer set (from another idr object).
Names are removed when the user refernces (handles) are all destroyed --
this required that handles for objects be counted separately from
internal kernel references (so that we can tell when the handles are all
gone).
mixed 32/64 bit systems need 'special' help for ioctl where the user-space
and kernel-space datatypes differ. Fixing the datatypes to be the same size,
and align the same way for both 32 and 64-bit ppc and x86 environments will
elimiante the need to have magic 32/64-bit ioctl translation code.
It's not really a graphics memory allocator, just something to track ranges
of address space. It doesn't involve actual allocation, and was consuming
some desired namespace.
This tries to automatically fetch a git revision string and if succeeds,
it #defines GIT_REVISION string macro. Packagers can override it by
'make GIT_REVISION=foo'.
Update Nouveau to use GIT_REVISION, if defined, instead of DRIVER_DATE
in struct drm_driver.
Signed-off-by: Pekka Paalanen <pq@iki.fi>
TV out needs to do load detection, which means we have to find an
available pipe to use for the detection. Port over the pipe reservation
code for this purpose.
Put off registering new outputs with sysfs until they're properly configured,
or we may get duplicates if the type hasn't been set yet (as is the case with
SDVO initialization). This also means moving de-registration into the cleanup
function instead of output destroy, since the latter occurs during the normal
course of setup when an output isn't found (and therefore not registered with
sysfs yet.
This patch ties outputs, output properties and hotplug events into the
DRM core. Each output has a corresponding directory under the primary
DRM device (usually card0) containing dpms, edid, modes, and connection
status files.
New hotplug change events occur when outputs are added or hotplug events
are detected.
This is the correct fix for the RS690 and hopefully the dma coherent work.
For now we limit everybody to a 32-bit DMA mask but it is possible for
RS690 to use a 40-bit DMA mask for the GART table itself,
and the PCIE cards can use 40-bits for the table entries.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The i915_vblank_swap() function schedules an automatic buffer swap
upon receipt of the vertical sync interrupt. Such an operation is
lengthy so it can't be allowed to happen in normal interrupt context,
thus the DRM implements this by scheduling the work in a kernel
softirq-scheduled tasklet. In order for the buffer swap to work
safely, the DRM's central lock must be taken, via a call to
drm_lock_take() located in drivers/char/drm/drm_irq.c within the
function drm_locked_tasklet_func(). The lock-taking logic uses a
non-interrupt-blocking spinlock to implement the manipulations needed
to take the lock. This semantic would be safe if all attempts to use
the spinlock only happen from process context. However this buffer
swap happens from softirq context which is really a form of interrupt
context. Thus we have an unsafe situation, in that
drm_locked_tasklet_func() can block on a spinlock already taken by a
thread in process context which will never get scheduled again because
of the blocked softirq tasklet. This wedges the kernel hard.
To trigger this bug, run a dual-head cloned mode configuration which
uses the i915 drm, then execute an opengl application which
synchronizes buffer swaps against the vertical sync interrupt. In my
testing, a lockup always results after running anywhere from 5 minutes
to an hour and a half. I believe dual-head is needed to really
trigger the problem because then the vertical sync interrupt handling
is no longer predictable (due to being interrupt-sourced from two
different heads running at different speeds). This raises the
probability of the tasklet trying to run while the userspace DRI is
doing things to the GPU (and manipulating the DRM lock).
The fix is to change the relevant spinlock semantics to be the
interrupt-blocking form. After this change I am no longer able to
trigger the lockup; the longest test run so far was 20 hours (test
stopped after that point).
Note: I have examined the places where this spinlock is being
employed; all are reasonably short bounded sequences and should be
suitable for interrupts being blocked without impacting overall kernel
interrupt response latency.
Signed-off-by: Mike Isely <isely@pobox.com>
Conflicts:
linux-core/drm_compat.c
linux-core/drm_compat.h
linux-core/drm_ttm.c
shared-core/i915_dma.c
Bump driver minor to 13 due to introduction of new
relocation type.
This patch fixes bits of the DRM so to make the radeon DRI work on
non-cache coherent PCI DMA variants of the PowerPC processors.
It moves the few places that needs change to wrappers to that
other architectures with similar issues can easily add their
own changes to those wrappers, at least until we have more useful
generic kernel API.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
PCI- or high memory.
This is substantially more efficient than drm_bo_kmap,
since the mapping only lives on a single processor.
Unmapping is done use kunmap_atomic(). Flushes only a single tlb() entry.
Add a support utility int drm_bo_pfn_prot() that returns the
pfn and desired page protection for a given bo offset.
This is all intended for relocations in bound TTMS or vram.
Mapping-accessing-unmapping must be atomic, either using preempt_xx() macros
or a spinlock.
PCI- or high memory.
This is substantially more efficient than drm_bo_kmap,
since the mapping only lives on a single processor.
Unmapping is done use kunmap_atomic(). Flushes only a single tlb() entry.
Add a support utility int drm_bo_pfn_prot() that returns the
pfn and desired page protection for a given bo offset.
This is all intended for relocations in bound TTMS or vram.
Mapping-accessing-unmapping must be atomic, either using preempt_xx() macros
or a spinlock.
This change adds a driver feature that for i915 is controlled by a module
parameter. You now need to do insmod i915.ko modeset=1 to enable it the
modesetting paths.
It also fixes up lots of X paths. I can run my new DDX driver on this code
with and without modesetting enabled
Allow mode to be set with fb_id set to -1, meaning set
the mode with the current fb (if we have one bound).
Allow intelfb to hook back up it's fb if modesetting
clears it (maybe temporary).
Move any crtc->fb related register changes to set_base
in intel_fb.
General intelfb cleanups.
This fixes at least two problems:
* The vblank_disable_fn timer callback could get called after the DRM was
de-initialized, e.g. on X server shutdown.
* Leak of vblank related resources when disabling and re-enabling the IRQ, e.g.
on an X server reset.
On many chipsets, the checks for DPLL enable or VGA mode will prevent the
pipeconf regs from being restored, which could result in a blank display or X
failing to come back after resume. So restore them unconditionally along with
actually restoring pipe B's palette correctly.
On resume, if the interrupt state isn't restored correctly, we may end
up with a flood of unexpected or ill-timed interrupts, which could cause
the kernel to disable the interrupt or vblank events to happen at the
wrong time. So save/restore them properly.
There were two problems with the existing callback code: the vblank
enable callback happened multiple times per disable, making drivers more
complex than they had to be, and there was a race between the final
decrement of the vblank usage counter and the next enable call, which
could have resulted in a put->schedule disable->get->enable->disable
sequence, which would be bad.
So add a new vblank_enabled array to track vblank enable on per-pipe
basis, and add a lock to protect it along with the refcount +
enable/disable calls to fix the race.
sequence number may actually turn up before the corresponding fence
object has been queued on the ring.
Fence drivers can use this member to determine whether a
sequence number must be re-reported.
In hibernate, we may end up calling the VGA save regs function twice, so we
need to make sure it's idempotent. That means leaving ARX in index mode after
the first save operation. Fixes hibernate on 965.
i915_flush_ttm was unconditionally executing a clflush instruction
to (obviously) flush the cache. Instead, check if the cpu supports
clflush, and if not, fall back to calling wbinvd to flush the entire
cache.
Signed-off-by: Kyle McMartin <kmcmartin@redhat.com>
(1 << bits) is an undefined value when bits == 32.
gcc may generate 1 with this expression
which will lead to an infinite retry loop in
drm_ht_just_insert_please.
Because of the different implement of hash_long,
this issue is more frequenly see on 64 bit system
As DRM_DEBUG macro already prints out the __FUNCTION__ string (see
drivers/char/drm/drmP.h), it is not worth doing this again. At some
other places the ending "\n" was added.
airlied:- I cleaned up a few that this patch missed also
I couldn't figure out what drm_bo_type_dc was for; Dave Airlie finally clued
me in that it was the 'normal' buffer objects with kernel allocated pages
that could be mmapped from the drm device file.
I thought that 'drm_bo_type_device' was a more descriptive name.
I also added a bunch of comments describing the use of the type enum values and
the functions that use them.