This is the correct fix for the RS690 and hopefully the dma coherent work.
For now we limit everybody to a 32-bit DMA mask but it is possible for
RS690 to use a 40-bit DMA mask for the GART table itself,
and the PCIE cards can use 40-bits for the table entries.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The i915_vblank_swap() function schedules an automatic buffer swap
upon receipt of the vertical sync interrupt. Such an operation is
lengthy so it can't be allowed to happen in normal interrupt context,
thus the DRM implements this by scheduling the work in a kernel
softirq-scheduled tasklet. In order for the buffer swap to work
safely, the DRM's central lock must be taken, via a call to
drm_lock_take() located in drivers/char/drm/drm_irq.c within the
function drm_locked_tasklet_func(). The lock-taking logic uses a
non-interrupt-blocking spinlock to implement the manipulations needed
to take the lock. This semantic would be safe if all attempts to use
the spinlock only happen from process context. However this buffer
swap happens from softirq context which is really a form of interrupt
context. Thus we have an unsafe situation, in that
drm_locked_tasklet_func() can block on a spinlock already taken by a
thread in process context which will never get scheduled again because
of the blocked softirq tasklet. This wedges the kernel hard.
To trigger this bug, run a dual-head cloned mode configuration which
uses the i915 drm, then execute an opengl application which
synchronizes buffer swaps against the vertical sync interrupt. In my
testing, a lockup always results after running anywhere from 5 minutes
to an hour and a half. I believe dual-head is needed to really
trigger the problem because then the vertical sync interrupt handling
is no longer predictable (due to being interrupt-sourced from two
different heads running at different speeds). This raises the
probability of the tasklet trying to run while the userspace DRI is
doing things to the GPU (and manipulating the DRM lock).
The fix is to change the relevant spinlock semantics to be the
interrupt-blocking form. After this change I am no longer able to
trigger the lockup; the longest test run so far was 20 hours (test
stopped after that point).
Note: I have examined the places where this spinlock is being
employed; all are reasonably short bounded sequences and should be
suitable for interrupts being blocked without impacting overall kernel
interrupt response latency.
Signed-off-by: Mike Isely <isely@pobox.com>
Conflicts:
linux-core/drm_compat.c
linux-core/drm_compat.h
linux-core/drm_ttm.c
shared-core/i915_dma.c
Bump driver minor to 13 due to introduction of new
relocation type.
This patch fixes bits of the DRM so to make the radeon DRI work on
non-cache coherent PCI DMA variants of the PowerPC processors.
It moves the few places that needs change to wrappers to that
other architectures with similar issues can easily add their
own changes to those wrappers, at least until we have more useful
generic kernel API.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
PCI- or high memory.
This is substantially more efficient than drm_bo_kmap,
since the mapping only lives on a single processor.
Unmapping is done use kunmap_atomic(). Flushes only a single tlb() entry.
Add a support utility int drm_bo_pfn_prot() that returns the
pfn and desired page protection for a given bo offset.
This is all intended for relocations in bound TTMS or vram.
Mapping-accessing-unmapping must be atomic, either using preempt_xx() macros
or a spinlock.
PCI- or high memory.
This is substantially more efficient than drm_bo_kmap,
since the mapping only lives on a single processor.
Unmapping is done use kunmap_atomic(). Flushes only a single tlb() entry.
Add a support utility int drm_bo_pfn_prot() that returns the
pfn and desired page protection for a given bo offset.
This is all intended for relocations in bound TTMS or vram.
Mapping-accessing-unmapping must be atomic, either using preempt_xx() macros
or a spinlock.
This change adds a driver feature that for i915 is controlled by a module
parameter. You now need to do insmod i915.ko modeset=1 to enable it the
modesetting paths.
It also fixes up lots of X paths. I can run my new DDX driver on this code
with and without modesetting enabled
Allow mode to be set with fb_id set to -1, meaning set
the mode with the current fb (if we have one bound).
Allow intelfb to hook back up it's fb if modesetting
clears it (maybe temporary).
Move any crtc->fb related register changes to set_base
in intel_fb.
General intelfb cleanups.
This fixes at least two problems:
* The vblank_disable_fn timer callback could get called after the DRM was
de-initialized, e.g. on X server shutdown.
* Leak of vblank related resources when disabling and re-enabling the IRQ, e.g.
on an X server reset.
On many chipsets, the checks for DPLL enable or VGA mode will prevent the
pipeconf regs from being restored, which could result in a blank display or X
failing to come back after resume. So restore them unconditionally along with
actually restoring pipe B's palette correctly.
On resume, if the interrupt state isn't restored correctly, we may end
up with a flood of unexpected or ill-timed interrupts, which could cause
the kernel to disable the interrupt or vblank events to happen at the
wrong time. So save/restore them properly.
There were two problems with the existing callback code: the vblank
enable callback happened multiple times per disable, making drivers more
complex than they had to be, and there was a race between the final
decrement of the vblank usage counter and the next enable call, which
could have resulted in a put->schedule disable->get->enable->disable
sequence, which would be bad.
So add a new vblank_enabled array to track vblank enable on per-pipe
basis, and add a lock to protect it along with the refcount +
enable/disable calls to fix the race.
sequence number may actually turn up before the corresponding fence
object has been queued on the ring.
Fence drivers can use this member to determine whether a
sequence number must be re-reported.
In hibernate, we may end up calling the VGA save regs function twice, so we
need to make sure it's idempotent. That means leaving ARX in index mode after
the first save operation. Fixes hibernate on 965.
i915_flush_ttm was unconditionally executing a clflush instruction
to (obviously) flush the cache. Instead, check if the cpu supports
clflush, and if not, fall back to calling wbinvd to flush the entire
cache.
Signed-off-by: Kyle McMartin <kmcmartin@redhat.com>
(1 << bits) is an undefined value when bits == 32.
gcc may generate 1 with this expression
which will lead to an infinite retry loop in
drm_ht_just_insert_please.
Because of the different implement of hash_long,
this issue is more frequenly see on 64 bit system
As DRM_DEBUG macro already prints out the __FUNCTION__ string (see
drivers/char/drm/drmP.h), it is not worth doing this again. At some
other places the ending "\n" was added.
airlied:- I cleaned up a few that this patch missed also
I couldn't figure out what drm_bo_type_dc was for; Dave Airlie finally clued
me in that it was the 'normal' buffer objects with kernel allocated pages
that could be mmapped from the drm device file.
I thought that 'drm_bo_type_device' was a more descriptive name.
I also added a bunch of comments describing the use of the type enum values and
the functions that use them.
Flags pending validation were stored in a misleadingly named field, 'mask'.
As 'mask' is already used to indicate pieces of a flags field which are
changing, it seems better to use a name reflecting the actual purpose of
this field. I chose 'proposed_flags' as they may not actually end up in
'flags', and in an case will be modified when they are moved over.
This affects the API, but not ABI of the user-mode interface.
Previously, dummy_read_page was used only for read-only user allocations; it
filled in pages that were not present in the user address map (presumably,
these were allocated but never written to pages).
This patch allows them to be used for read-only ttms allocated from the
kernel, so that applications can over-allocate buffers without forcing every
page to be allocated.
I'm hoping to use the dummy_read_page for kernel allocated buffers to avoid
allocating extra pages for read-only buffers (like vertex and batch buffers).
This also eliminates the 'write' parameter to drm_ttm_set_user and just
has DRM_TTM_PAGE_WRITE passed into drm_ttm_create.
Aside from changing drm_bind_ttm to drm_ttm_bind, this patch
adds only documentation and fixes the functions inside drm_ttm.c
to all be prefixed with drm_ttm_.
drmBOSetStatus does not bother to set the fence_class parameter.
Fortunately, drm_bo_setstatus_ioctl doesn't end up using it as it
calls drm_bo_handle_validate with use_old_fence_class = 1.
Document parameters and usage for drm_bo_handle_validate. Change parameter
order to match drm_bo_do_validate (fence_class has been moved to after
flags, hint and mask values). Existing users of this function have been
changed, but out-of-tree users must be modified separately.
Add comments about the parameters to drm_bo_do_validate, along
with comments for the DRM_BO_HINT options. Remove the 'do_wait'
parameter as it is duplicated by DRM_BO_HINT_DONT_BLOCK.
Creating a ttm was done with drm_ttm_init while destruction was done with
drm_destroy_ttm. Renaming these to drm_ttm_create and drm_ttm_destroy makes
their use clearer. Passing page_flags to the create function will allow that
to know whether user or kernel pages are needed, with the goal of allowing
kernel ttms to be saved for later reuse.
This fix is actually a bit of a cleanup too--it moves lock freeing to
drm_rmmap_locked and out of drm_lastclose. This makes it symmetrical with
addmap and also prevents the lock from being incorrectly freed from driver
mappings.
This should work on all radeon but there is still many things todo:
- add crtc2
- tmds
- lvds
- add bios data table so we don't need to hardcode dac/crtc infos
- separate clock control to make power saving easier & cleaner
- tiling (warning tiling shouldn't be enable in double scan or interlace)
- surface reg manager (this goes along with tiling)
- suspend/resume hook
- avivo & r500 family support
- atom bios support (for posting card mostly)
- finish superioctl skeleton
- what else ? :)
so really want to get a list of modes per output not the global hammer list.
also we remove the mode ids and let the user pass back the full mode description
need to fix up add/remove mode for user modes now
This allow the user to retrieve a list of properties for an output.
Properties can either be 32-bit values or an enum with an associated name.
Range properties are to be supported.
This API is probably not all correct, I may make properties part of the general
resource get when I think about it some more.
So basically you can create properties and attached them to whatever outputs you want,
so it should be possible to create some generics and just attach them to every output.
Since the drm_mode_card_res structure contains user pointers, we have to use
put_user and copy_to_user to write stuff out. The DRM ioctl wrapper will only
take care of copying the base drm_mode_card_res struct, not the included
arrays.
This header file is shared across linux and bsd, but is not installed
for user space to access. It's the place to put prototypes and data
types that aren't platform or chipset specific, but still internal to
the drm.
This way driver can get_edid in output status detection
(using all workaround which are in get_edid) and then provide
this edid data in get_mode callback of output.
Conflicts:
linux-core/Makefile.kernel
linux-core/drm_stub.c
linux-core/i915_drv.c
shared-core/i915_dma.c
shared-core/i915_drv.h
Fixup suspend/resume conflicts (basically use what's in DRM master for now).
Also fix up a few other conflicts that snuck in (i915_dma changes etc.).
The vblank_init function wanted a couple of cleanups.
Also, drm_irq_install wasn't checking the new return value of irq_postinstall.
If it returns a failure, assume IRQs didn't get set up and take appropriate
action.
This mapping allows cached objects to be mapped in/out of the TT space
with the appropriate flushing calls.
It should put back the old CACHED functionality for snooped mappings
Conflicts:
linux-core/drmP.h
linux-core/drm_drv.c
linux-core/drm_irq.c
shared-core/i915_drv.h
shared-core/i915_irq.c
shared-core/mga_drv.h
shared-core/mga_irq.c
shared-core/radeon_drv.h
shared-core/radeon_irq.c
Merge in the latest master bits and update the remaining drivers (except
mach64 which math_b is working on). Also remove the 9xx hack from the i915
driver; it seems to be correct.
Add suspend/resume support to the i915 driver. Moves some of the
initialization into the driver load routine, and fixes up places where we
assumed no dev_private existed in some of the cleanup paths. This allows
us to suspend/resume properly even if X isn't running.
Make DRM devices use real Linux devices instead of class devices, which are
going away. While we're at it, clean up some of the interfaces to take
struct drm_device * or struct device * and use the global drm_class where
needed instead of passing it around.
Implement a version check IOCTL for drivers that don't use
drmMMInit from user-space.
Remove the minor check from the kernel code. That's really up
to the driver.
Bump major.
Remove need for lock for now.
May create races when we clean memory areas or on takedown.
Needs to be fixed.
Really do a validate on buffer creation in order to avoid problems with
fixed memory buffers.
We now always create a drm_ref_object for user objects and this is then the only
things that holds a reference to the user object. This way unreference on will
destroy the user object when the last drm_ref_object goes way.
The buffer object type is still tracked internally, but it is no longer
part of the user space visible ioctl interface. If the bo create ioctl
specifies a non-NULL buffer address we assume drm_bo_type_user,
otherwise drm_bo_type_dc. Kernel side allocations call
drm_buffer_object_create() directly and can still specify drm_bo_type_kernel.
Not 100% this makes sense either, but with this patch, the buffer type
is no longer exported and we can clean up the internals later on.
This isn't 100% as command submission via PCI-e GART buffers doesn't work.
I've hacked around that for the time being. This is essentially the code
that was used at the POWER.org event to show Bimini.
All nv30 functions in nv30_graph.c that can be used on nv20 are renamed
as accordingly. nv20 specific parts from nv20_graph.c are moved into
nv30_graph.c.
Conflicts:
linux-core/drmP.h
linux-core/drm_bo.c
linux-core/drm_drv.c
linux-core/drm_objects.h
shared-core/drm.h
shared-core/i915_dma.c
shared-core/i915_drv.h
shared-core/i915_irq.c
Mostly removing typedefs that snuck into the modesetting code and
updating to the latest TTM APIs. As of today, the i915 driver builds,
but there are likely to be problems, so debugging and bugfixes will
come next.
Modify the TTM backend bind arguments.
Export a number of functions needed for driver-specific super-ioctls.
Add a function to map buffer objects from the kernel, regardless of where they're
currently placed.
A number of error fixes.
This branch replaces the NO_MOVE/NO_EVICT flags to buffer validation with a
separate privileged ioctl to pin buffers like NO_EVICT meant before. The
functionality that was supposed to be covered by NO_MOVE may be reintroduced
later, possibly in a different way, after the superioctl branch is merged.
Previously any ioctls that weren't explicitly listed in the compat ioctl
table would fail with ENOTTY. If the incoming ioctl number is outside the
range of the table, assume that it Just Works, and pass it off to drm_ioctl.
This make the fence related ioctls work on 64-bit PowerPC.
The i830 and newer intel 2D code adds the AGP base to map offsets already,
because it wasn't doing the AGP enable which used to set dev->agp->base.
Credit goes to Zhenyu for finding the issue.
The original XGI kernel driver strobed 0xB03F each time a page was
allocated to back a GART page. When the driver was converted to use
the DRM SG interface, this code was lost. Returning it fixes a long
standing issue where the X-server would work fine the first time, but
acceleration commands would be ignored on the second X-server
invocation.
Since the heaps weren't marked as uninitialized, SG memory was never
re-allocated. This prevented the X-server from being able to restart
without re-loading the kernel module.
The DRM_XGI_PCIE_ALLOC and DRM_XGI_FB_ALLOC ioctls (and the matching
free ioctls) are unified to DRM_XGI_ALLOC. The desired memory region
is selected by xgi_mem_alloc::location. The region is magically
encoded in xgi_mem_alloc::index, which is used to release the memory.
Bump to version 0.11.0. This update requires a new DDX.
Pass the master's file pointer, as supplied to xgi_bootstrap, to
xgi_cmdlist_initialize. Associate that pointer with the memory
allocated for the command list buffer. By doing this the memory will
be automatically cleaned up when the master closes the device. This
allows the removal of some clean up code.
1. DRM_NOUVEAU_GPUOBJ_FREE
Used to free GPU objects. The obvious usage case is for Gr objects,
but notifiers can also be destroyed in the same way.
GPU objects gain a destructor method and private data fields with
this change, so other specialised cases (like notifiers) can be
implemented on top of gpuobjs.
2. DRM_NOUVEAU_CHANNEL_FREE
3. DRM_NOUVEAU_CARD_INIT
Ideally we'd do init during module load, but this isn't currently
possible. Doing init during firstopen() is bad as X has a love of
opening/closing the DRM many times during startup. Once the
modesetting-101 branch is merged this can go away.
IRQs are enabled in nouveau_card_init() now, rather than having the
X server call drmCtlInstHandler(). We'll need this for when we give
the kernel module its own channel.
4. DRM_NOUVEAU_GETPARAM
Add CHIPSET_ID value, which will return the chipset id derived
from NV_PMC_BOOT_0.
4. Use list_* in a few places, rather than home-brewed stuff.
When the GE is shut down, an empty command packet without a begin-link
must be sent. After this command is sent, wait for the hardware to go
idle. Finally, turn off the GE and disable MMIO.
This should let us allocate buffers without holding the hardware lock.
While here, add DRM_DEBUG info for the drm_bo ioctls, so you can see something
more specific than just the cmd value per ioctl.
The core DRM lastclose routine automatically destroys all mappings and
releases SG memory. XP10 DRM and DDX assumed this data stayed around
until module unload. xgi_bootstrap was reworked to recreate all these
mappings. In addition, the drm_addmap for the GART backing store was
moved into the kernel. This causes a change to the ioctl protocol and
a version bump.
Based on review comments from airlied, XGI_CHECK_PCI_CONFIG is
removed. He believes (and I tend to agree) that this is a largely
unnecessary workaround for a bug elsewhere.
There were numerous unnecessary fields in xgi_cmd_info. The remaining
fields had pretty crummy names. Cut out the cruft, and rename the
rest. As a result, the unused parameter "triggerCounter" to
triggerHWCommandList can be removed.
Debug print fix in drm_release().
Forgotten local variable init in drm_setversion().
Unnecessary put_user() in drm_addmap_ioctl().
ioctl->cmd check broken in drm_ioctl(); workaround.
The data is now in kernel space, copied in/out as appropriate according to the
This results in DRM_COPY_{TO,FROM}_USER going away, and error paths to deal
with those failures. This also means that XFree86 4.2.0 support for i810 DRM
is lost.
As a fallout, replace filp storage with file_priv storage for "unique
identifier of a client" all over the DRM. There is a 1:1 mapping, so this
should be a noop. This could be a minor performance improvement, as everything
on Linux dereferenced filp to get file_priv anyway, while only the mmap ioctls
went the other direction.
This was used to make all ioctl handlers return -errno on linux and errno on
*BSD. Instead, just return -errno in shared code, and flip sign on return from
shared code to *BSD code.
Generate the begin command once in a temporary buffer. Then,
depending on whether the command is to be written directly to the
hardware or to a secondary buffer, copy to command to the correct place.
Moved the getCurBatchBeginPort before its only caller. Modified
function to return the command ID instead of the port offset.
Function also now assumes input begin type is value.
Added code to ioctl handler to validate begin type.
xgi_cmdlist_initialize wasn't correctly checking for errors from
xgi_pcie_alloc. Furthermore, xgi_bootstrap, the one caller of
xgi_cmdlist_initialize, wasn't check its return value.
For reasons that I don't understand, the drm_addmap call would succeed
in xgi_driver_load, but writes to the map later would oops. Moving it
to xgi_bootstrap fixes this problem.
The ioctlss XGI_ESC_DEVICE_INFO, XGI_ESC_MEM_COLLECT,
XGI_ESC_PCIE_CHECK, XGI_ESC_GET_SCREEN_INFO, XGI_ESC_PUT_SCREEN_INFO,
XGI_ESC_MMIO_INFO, and XGI_ESC_SAREA_INFO, are completely unnecessary.
The will be doubly useless when the driver is converted to the DRM
infrastructure.
Most occurances of U32 were converted to u32. These are cases where
the data represents something that will be written to the hardware.
Other cases were converted to 'unsigned int'.
U32 was the last type in xgi_types.h, so that file is removed.
These two structures were used as the request and reply for certain
ioctls. Having a different type for an ioctl's input and output is
just wierd. In addition, each structure contained fields (e.g., pid)
that had no business being there.
This change requires updates to user-space.
Two large blocks of code were moved out of this function into separate
functions. This brought some much needed sanity to the indentation.
Some dead varaibles were removed.
Dave Airlie pointed out on IRC that idr_replace lets us know if the ID hasn't
been allocated, so we don't need a special pointer value for allocated IDs that
don't have valid information yet.
There's a difference between a drawable ID not having valid drawable
information and not being allocated at all. Not making the distinction would
break i915 DRM swap scheduling with older X servers that don't push drawable
cliprect information to the DRM.
The whole purpose of xgi_pcie_heap_check is to log information about
entries on the used_list. If XGI_DEBUG is not set, it doesn't print
anything. Therefore we can #ifdef the whole function body.
Convert open-code list iteration to use list_for_each_entry.
Comment in the code explains it. Basically, I put an if-statement
around a block of code to prevent a NULL pointer dereference that
should never happen in the first place. Eventually, this will need to
come out.
This function used to return 'void *', which was then cast to
'xgi_pcie_block_t *' at the only caller. I changed the return type to
'struct xgi_pcie_block_s *' and removed the explicit cast.
For various reasons, this ioctl was a bad idea.
At channel creation we now automatically create DMA objects covering
available VRAM and GART memory, where the client used to do this themselves.
However, there is still a need to be able to create DMA objects pointing at
specific areas of memory (ie. notifiers). Each channel is now allocated a
small amount of memory from which a client can suballocate things (such as
notifiers), and have a DMA object created which covers the suballocated area.
The NOTIFIER_ALLOC ioctl exposes this functionality.
- use a timer for disabling vblank events to avoid enable/disable calls too
often
- make i915 work with pre-965 chips again (would like to structure this
better, but this hack works on my test system)
Commit 9b01bd5b284bbf519b726b39f1352023cb5e9e69 introduced a
compat_ioctl handler for RADEON_SETPARAM, the sole purpose of which was
to handle the fact that on i386, alignof(uint64_t)==4.
Unfortunately, this handler was installed for _all_ 64-bit
architectures, instead of only x86_64 and ia64. And thus it breaks
32-bit compatibility on every other arch, where 64-bit integers are
aligned to 8 bytes in 32-bit mode just the same as in 64-bit mode.
Arnd has a cunning plan to use 'compat_u64' with appropriate alignment
attributes according to the 32-bit ABI, but for now let's just make the
compat_radeon_cp_setparam routine entirely disappear on 64-bit machines
whose 32-bit compat support isn't for i386. It would be a no-op with
compat_u64 anyway.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
If the driver doesn't support vertical blank interrupts, it won't call
drm_vblank_init(), and dev->num_crtcs will be 0.
Also fix an off-by-one test against dev->num_crtcs.
- use correct refcount variable in get/put routines
- extract counter update from drm_vblank_get
- make signal handling callback per-crtc
- update interrupt handling logic, drivers should use drm_handle_vblank
- move wakeup and counter update logic to new drm_handle_vblank routine
- fixup usage of get/put in light of counter update extraction
- fix longstanding bug in signal code, update pending counter only
*after* we're sure we'll setup signal handling
Introduce tile members for future tiled buffer support.
Allow user-space to explicitly define a fence-class.
Remove the implicit fence-class mechanism.
64-bit wide buffer object flag member.
This reverts commit 6e860d08d0.
As I said not a good plan - this macro will have to stay for now,
trying to do the vbl code with the inline was a bit messy - may need specialised
drm wait on functions
held across any operations that modify mode lists, crtc config, output
config, etc. It should be taken at high level entry points (currently just
initial config and user IOCTL).
Seems to work ok on my system, but needs more testing (with lockdep) and
review from some fresh eyes.
Conflicts:
linux-core/drm_crtc.c
linux-core/drm_fb.c
Lots of changes to merge with alanh's latest stuff:
o fix use of fb->pitch now that it has the right value
o add new helper for finding the CRTC given an FB
o fix new fb_probe/fb_remove functions to take a CRTC
o fixup callers of new FB routines
o port drm_fb changes to intel_fb
o check for errors after creating fb buffer object
o go back to using cfb_imageblit since the accel stubs aren't ready
places).
Add new FB hooks to the drm driver structure and make i915 use them for an
Intel specific FB driver. This will allow acceleration and better handling
of the command stream.