I slipped it in with the alloc_tiled changes, since we were explicitly
throwing the parameter away. It caught some bogus released code, which
we've now fixed, so remove the asserts to keep old drivers working.
This simplifies driver code in handling object allocation, and also gives us
an opportunity to possibly cache tiled buffers if it turns out to be a win.
[anholt: This is chopped out of the execbuf2 patch, as it seems to be useful
separately and cleans up the execbuf2 changes to be more obvious]
As the target architecture for Intel GPUs is the x86, we can presume to
have reasonable compiler support for Intel atomic intrinsics, i.e. gcc,
and so use those in preference to pulling in a complicated mess of
fragile assembly.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[anholt: hand-resolved against my previous commit. This brings cairo-gl
firefox-talos-gfx time from 65 seconds back down to 62 seconds.]
Signed-off-by: Eric Anholt <eric@anholt.net>
Set the DONTNEED flag on cached buffers so that the kernel is free to
discard those when under memory pressure.
[anholt: This takes firefox-talos-gfx time from ~62 seconds to ~65 seconds
on my GM965, but it seems like a hit worth taking for the improved
functionality from saving memory]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
There are a bunch of places in GL where if we can't do this we have to
flush the batchbuffer, and the cost of lookups here is outweighed by flush
savings.
I thought I was going to do all sorts of crazy experiments with it. I never
did, and it turned out the free-after-a-few-seconds plan is working out fine.
The goal of the BO cache is to keep buffers on hand for fast continuous use,
as in every frame of a game or every batchbuffer of the X Server. Keeping
older buffers on hand not only doesn't serve this purpose, it may hurt
performance by resulting in disk cache getting kicked out, or even driving
the system to swap.
Bug #20766.
The logbase2 would overflow and wrap the size around to 0, making the code
allocate a 4kb object instead. By simplifying the code to just walk the
14-entry bucket array comparing sizes instead of indexing on
ffs(1 << logbase2(size)), we avoid silly math errors and have code of
approximately the same speed.
Many thanks to Simon Farnsworth for debugging and providing a working patch.
Bug #27365.
This avoids making objects significantly bigger than they would be
otherwise, which would result in some failing at binding to the GTT.
Found from firefox hanging on:
http://upload.wikimedia.org/wikipedia/commons/b/b7/Singapore_port_panorama.jpg
due to a software fallback trying to do a GTT-mapped copy between two 73MB
BOs that were instead each 128MB, and failing because both couldn't fit
simultaneously.
The cost here is that we get no opportunity to cache these objects and
avoid the mapping. But since the objects are a significant percentage
of the aperture size, each mapped access is likely having to fault and rebind
the object most of the time anyway.
Bug #20152 (2/3)
The convention is that all APIs are per-bufmgr, so make this one the same.
Then, have it return -1 on failure so that the application can know what's
going on and do something sensible.
Signed-off-by: Keith Packard <keithp@keithp.com>
This wraps the new DRM_IOCTL_I915_GET_PIPE_FROM_CRTC_ID ioctl,
allowing applications to discover the pipe number corresponding
to a given CRTC ID. This is necessary for doing pipe-specific
operations such as waiting for vblank on a given CRTC.
Scanout buffers need to be freed through the kernel as it holds a reference
to them; exposing this API allows applications allocating scanout buffers to
flag them as not reusable.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Add assertions to drm_intel_gem_bo_reference,
drm_intel_gem_bo_reference_locked and drm_intel_gem_bo_unreference_locked
that the object has not been freed (refcount > 0). Mistakes in refcounting
lead to attempts to insert a bo into a free list more than once which causes
application failure as empty free lists are dereferenced as buffer objects.
Signed-off-by: Keith Packard <keithp@keithp.com>
libdrm has some support for GTT mapping already, but there are bugs
with it (no surprise since it hasn't been used much).
In fixing 20803, I found that sharing bo_gem->virtual was a bad idea,
since a previously mapped object might not end up getting GTT mapped,
leading to corruption. So this patch splits the fields according to
use, taking care to unmap both at free time (but preserving the map
caching).
There's still a risk we might run out of mappings (there's a sysctl
tunable for max number of mappings per process, defaulted to 64k or so
it looks like) but at least GTT maps will work with these changes (and
some others for fixing PAT breakage in the kernel).
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This helps avoid the n^2 performance cost of counting tree size when we
get a lot of relocations into our batch buffer. rgb10text on keithp's laptop
went from 136k glyphs/sec to 234k glyphs/sec.
This avoids using the oldest BO in the BO cache and waiting for it to be
idle before we turn around and render to it with the GPU. Thanks to
Chris Wilson for pointing out how silly we were being.
This patch tries to use the available fence count to figure out whether a
given batch can succeed or not (just like the aperture check).
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Remember tiling mode values provided by appplications, and
record tiling mode when creating a buffer from another application. This
eliminates any need to ask the kernel for tiling values and also makes
reused buffers get the right tiling.
Signed-off-by: Keith Packard <keithp@keithp.com>
Applications may actually care if the mapping operation failed, so when
it happens, return an error indication. errno is probably trashed by
fprintf though.
Signed-off-by: Keith Packard <keithp@keithp.com>
The execbuffer ioctl returns ENOMEM when it fails to pin all of the buffers
in the GTT. This is usually caused by the DRM client attempting to use too
much memory in a single request. Dumping out the requested and available
memory values should help point out failures in the DRM code to catch over
commitments of this form.
Signed-off-by: Keith Packard <keithp@keithp.com>
Add mode setting files to libdrm, including xf86drmMode.* and the new
drm_mode.h header. Also add a couple of tests to sanity check the
kernel interfaces and update code to support them.
I wanted to avoid doing this, as it's a bunch of churn, but there was a
conflict between the dri_ symbols in libdrm and the symbols that were in
Mesa in 7.2, which broke Mesa 7.2 AIGLX when the 2D driver had loaded new
libdrm symbols. The new naming was recommended by cworth for giving the
code a unique prefix identifying where the code lives.
Additionally, take the opportunity to fix up two API mistakes: emit_reloc's
arguments were in a nonsensical order, and set_tiling lacked the stride
argument that the kernel will want to use soon. API compatibility with
released code is maintained using #defines.
This relies on a new kernel ioctl to get the available aperture size.
In order to provide reasonable performance from dri_bufmgr_check_aperture, we
now require that once a buffer has been used as the target of a relocation,
it gets no further relocations added to it. This cuts the cost of
check_aperture from 10% to 1% in the 3D driver with no code changes, but
slightly complicates our plans for the 2D driver.
Don't count on ioctl returning -errno; use errno directly.
Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
We want to be able to use the bufmgr from multiple threads for GL, and thus
we need to protect the internal structures.
The pthread-stubs package is used so that programs not linked against
pthreads get weak symbols to stubs and don't eat most of the cost.