The various create and open functions set the buffer size, but
drm_intel_bo_gem_create_from_prime() is an exception. In the 3.12 kernel
we can now use lseek on the prime fd to determine the size of the bo.
Use that and override the userprovided size. If the kernel doesn't
support this, we get an error and fall back to the user provided size.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Currently the package name and description duplicate that of the
core libdrm. Update those to reflect reality.
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Mark the address ranges as accessible with VALGRIND_MAKE_MEM_DEFINED.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
No need to prepare the .aub header and dump in that case, it'll be
done with the next call with true.
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
At DDX commit Chris mentioned the tendency we have of finding out more
PCI IDs only when users report. So Let's add all new reserved Haswell IDs.
Bugzilla: http://bugs.freedesktop.org/show_bug.cgi?id=63701
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
When publishing first HSW ids we weren't allowed to use "GT3" codname.
But this is the correct codname and Mesa is using it already.
So to avoid people getting confused why in Mesa it is called GT3 and here
it is called GT2_PLUS let's fix this name in a standard and correct way.
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It accidentally used the cmd id for the gen7 command and had an
outdated lenght field. Spotted while trying to make sense of an ivb
error_state from mesa 7.11 ...
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The second digit was off by one, which meant we accidentally treated
GT(n) as GT(n-1). This also meant no support for GT1 at all.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Protect the macro argument evaluations with parens.
This is already touching most lines, so while at it, fix up all white
space to uniform style throughout the file.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Intel GPU Tools is newer and arguably better. This change doesn't
completely merge the files because it's a bit simpler if we move the
I9XX macro over to Intel GPU Tools, and don't move over a few macros
from IGT that libdrm doesn't care about.
It has been discussed, and would seem even easier if Intel GPU Tools
simply used the libdrm header files. Whether or not we move to that,
this should help that effort.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
On Gen6, bit 15 is now `Depth Clear Value Valid`. This was being treated
as part of the length, and failing the rest of the batchbuffer decode.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
We didn't set the ring flag for BLT batches, so they got run on the
render ring. Shenanigans ensued, especially when we sent commands that
were only valid on the BLT ring.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
As we clear the relocs from the bo, we also need to clear the
contribution of the reloc_target_bo from the fence count. Otherwise they
are leaked and prevent any further relocations being added to the bo.
Originally posted to Free Desktop bug #52549 by David Shao.
Resolves Gentoo Bug #433403.
Commit message by Richard Yao.
Reviewed-by: Richard Yao <ryao@gentoo.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
References: https://bugs.freedesktop.org/show_bug.cgi?id=52549
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
intel_bufmgr_gem.c: In function 'drm_intel_bo_gem_export_to_prime':
intel_bufmgr_gem.c:2477:6: warning: unused variable 'ret' [-Wunused-variable]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
commit 92fd0ce4f6
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Fri Aug 31 11:16:53 2012 +0200
intel: properly test for HAS_LLC
missed slightly and in effect had no effect on the outcome of checking
whether the kernel/chipset supported LLC.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
It's the same situation as flink and we need take the same precautions.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
If the kernel supports the test, we need to check the param.
Copy&pasta from the above checks that only look at the return value.
Interesting how much one can get such a simple interface wrong.
Issue created in
commit 151cdcfe68
Author: Eugeni Dodonov <eugeni.dodonov@intel.com>
Date: Tue Jan 17 15:20:19 2012 -0200
intel: query for LLC support
Patch even claims to have fixed this in v2, but is actually unchanged
from v1.
Reported-by: Xiang, Haihao <haihao.xiang@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Otherwise pad appears uninitialized and valgrind grumbles.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Otherwise we end up with X hitting a fail-loop as the embedded libGL
stacks asserts whilst initialising.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
CCLD test_decode
./.libs/libdrm_intel.so: undefined reference to `drmPrimeHandleToFD'
./.libs/libdrm_intel.so: undefined reference to `drmPrimeFDToHandle'
collect2: ld returned 1 exit status
From Adam Jackson's explaination:
most distros have changed it so ld defaults to --no-copy-dt-needed-entries,
so if you use something from libdrm you can't just assume libdrm_intel
will bring it in for you, you have to be explicit
Signed-off-by: Rob Clark <rob@ti.com>
This adds interfaces for the X driver to use to create a
prime handle from a buffer, and create a bo from a handle.
v2: use Chris's suggested naming (well from at least for consistency)
v3: git commit --amend fail
v4: fix as per Chris's suggestions, group assignments, add get tiling
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Since there is no getparam for hardware context support, Mesa always
tries to obtain a context by calling drm_intel_gem_context_create and
NULL-checking the result. On an older kernel without context support,
this caused libdrm to print an unwanted message to stderr:
DRM_IOCTL_I915_GEM_CONTEXT_CREATE failed: Invalid argument
In fact, this caused every Piglit test to fail with a "warn" status due
to the unrecognized error message.
Change the message to use DBG() rather than fprintf(), so people can
still get the debug message, but it won't spam normally.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Hi list
The recently released libdrm 2.4.37 does not compile the Intel part:
test_decode.c: In function 'compare_batch':
test_decode.c:107: error: implicit declaration of function 'open_memstream'
PS: Please CC me.
Signed-off-by: Lauri Kasanen <cand@gmx.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Add relevant code to set up minimal state and call the appropriate
kernel IOCTLs.
This was missed in the previous cherry-picking for 2.3.36.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
I mistakenly "fixed" a bad decode with
commit 7d0a1d5ebb
Author: Ben Widawsky <ben@bwidawsk.net>
Date: Sun Jun 24 20:35:57 2012 -0700
intel/decode: VERTEX_ELEMENT_STATE, 1 means valid
However the actual fix is just to update the reference file, and
include GEN7 in the decode.
Props to Eric Anholt for putting the test in distcheck, or else I
wouldn't have caught this.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
To support this we extract the common execbuf2 functionality to be
called with, or without contexts.
The context'd execbuf does not support some of the dri1 stuff.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
int drm_intel_gem_bo_wait(drm_intel_bo *bo, uint64_t timeout_ns)
This should bump the libdrm version. We're waiting for context support
so we can do both features in one bump.
v2: don't return remaining timeout amount
use get param and fallback for older kernels
v3: only doing getparam at init
prototypes now have a signed input value
v4: update comments
fall back to correct polling behavior with new userspace and old kernel
v5: since the drmIoctl patch was not well received, return appropriate
values in this function instead. As Daniel pointed out, the polling
case (timeout == 0) should also return -ETIME.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This patch adds a new function,
drm_intel_bufmgr_gem_set_aub_annotations(), which can be used to
annotate the type and subtype of data stored in various sections of
each buffer. This data is used to populate type and subtype fields
when generating the .aub file, which improves the ability of later
debugging tools to analyze the contents of the .aub file.
If drm_intel_bufmgr_gem_set_aub_annotations() is not called, then we
fall back to the old set of annotations (annotate the portion of the
batchbuffer that is executed as AUB_TRACE_TYPE_BATCH, and everything
else as AUB_TRACE_TYPE_NOTYPE).
Reviewed-by: Eric Anholt <eric@anholt.net>
... and add support to decode MI instructions with functions.
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This improves the performance of Mesa's GL_MAP_UNSYNCHRONIZED_BIT path
in GL_ARB_map_buffer_range. Improves Unigine Tropics performance at
1024x768 by 2.30482% +/- 0.0492146% (n=61)
v2: Fix comment grammar.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
drmIoctl returns -1 on error with errno set to the error value. Other
users of it in this file just check for != 0, and only use errno when
they need to send an error value on to the caller of the API.
This will allow the driver to capture all of its execution state to a
file for later debugging. intel_gpu_dump is limited in that it only
captures batchbuffers, and Mesa's captures, while more complete, still
capture only a portion of the state involved in execution.
This is a squash commit of a long series of hacking as we tried to get
the resulting traces to work in the internal simulator. It contains
contributions by Yuanhan Liu and Kenneth Graunke.
v2: Drop the MI_FLUSH_ENABLE setup.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
For example:
export INTEL_DEVID_OVERRIDE=0x162
If this variable is set, don't actually submit the batchbuffer to the
GPU, it probably contains commands for the wrong generation of hardware.
v2: Introduce a getter for the overridden devid, and avoid getenv per exec.
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Note that the regression test complains here: The batch that was
captured included a bug in its packet output, which was later fixed in
Mesa.
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This requires pulling the gen6 3DSTATE_WM out to a function so it
doesn't override gen7's handler.
v2: Fix pasteo in interpreting ZW interpolation (thanks danvet!).
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Every access to either the GTT or CPU pointer is supposed to be
proceeded by a set_domain ioctl so that GEM is able to manage the cache
domains correctly and for the following access to be coherent. Of
course, some people explicitly want incoherent, non-blocking access
which is going to trigger warnings by this patch but are probably better
served by explicit suppression.
v2: Also mark the pointers as inaccessible following the explicit unmap
and implicit unmap upon return to the cache.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In particular, declare the hidden CPU mmaps to valgrind so that it knows
about those memory regions.
v2: Add an additional VG_CLEAR for the getparam
References: https://bugs.freedesktop.org/show_bug.cgi?id=35071
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Ben Widawsky <ben@bwidawsk.net>
[anholt: Ideally valgrind should just learn about the ioctls, and
removing the clear for the non-valgrindified code feels risky.]
Reviewed-by: Eric Anholt <eric@anholt.net>
This adds support for querying the kernel about the LLC support in the
hardware.
In case the ioctl fails, we assume that it is present on GEN6 and GEN7.
v2: fix the return code checking
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
If the pci_device's actual gen was > 4, then we stupidly set
bufmgr_gem->gen = 6. Luckily this caused no bugs, and this fix shouldn't
change any behavior, because all checks against the gen currently have one
of the forms below:
gen == 2
gen == 3
gen >= 4
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This just gets packet name and length in place, with the remainder
unfinished. I've long since finished the work that got me started
fixing up the decode.
Since CC_STATE_POINTERS for gen6 and 7 are quite different but use the
same opcode, move gen6 out to a helper function too, so we can use a
helper function for gen7.
This puts the error message in a consistent location relative to the
packet, and while I'm here I made the error message a bit more
informative.
Now, most static length packets need to just declare their length in
the table and not worry.
The overflow checks were all thoroughly untested, and a bunch of the
ones I'm deleting were pretty broken. Now, in the case of overflow,
you just decode data of 0xd0d0d0d0, and instr_out prints the warning
message instead. Note that this still has the same issue of being
under-tested, but at least it's one place instead of per-packet.
A couple of BUFFER_FAIL uses are left where the length to be decoded
could be (significantly) larger than a page, and the decode didn't
just call instr_out (which doesn't dereference data itself unless it's
safe).
The .batch was generated using the dump-a-batch branch of
git://people.freedesktop.org/~anholt/mesa
using glxgears on gen7 hardware, using INTEL_DEVID_OVERRIDE for
non-gen7 (this means that offsets in the buffers for non-gen7 are 0!).
The .ref was generated by:
./test_decode tests/gen7-3d.batch -dump.
The .sh exists because you can't supply arguments to tests using the
simple automake tests driver. Something reasonable could be done
using automake's parallel-tests driver (in fact, a previous version of
the patch did that), but I was concerned that:
1) The parallel-tests driver is documented to be unstable -- they may
change interfaces on us later.
2) The parallel-tests driver hides the output of tests in .log files
scattered all over the tree, which was ugly and more painful to
work with.
v2: Actually add the batch files, add a .gitignore for the *-new.txt
files added after failures, and fix failure mode for undetected
chipset name.
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> (v1)
Consumers often want to choose stdout vs stderr, and for testing I
want to output to an open_memstream file.
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
It was producing an unused code warning. I'm tempted to just remove
it, since it's unused, but I *might* use it soon.
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Eugeni Dodonov <eugeni@dodonov.net>
I'd rather be able to use c99 variable declarations (there's a lot of
awful code layout due to being c90ish), but I'll leave that for later.
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Eugeni Dodonov <eugeni@dodonov.net>
There was plenty of dropped useful data, and some horribly
mis-formatted data.
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Eugeni Dodonov <eugeni@dodonov.net>
We've got a different (better) set of warning flags in place in this
tree.
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Eugeni Dodonov <eugeni@dodonov.net>
My plan is to use this drm_intel_dump_batchbuffer() interface for the
current GPU tools, and the current Mesa batch dumping usage, while
eventually building more interesting interfaces for other uses.
Warnings are currently suppressed by using a helper lib with CFLAGS
set manually, because the code is totally not ready for libdrm's warnings
setup.
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Eugeni Dodonov <eugeni@dodonov.net>
Some comments weren't wrapped, and for some reason uint32_t *data got
an extra space (while other instances of "type *identifier" didn't),
and the indentation of the opcode-list structs got trashed.
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Eugeni Dodonov <eugeni@dodonov.net>
We generally go for kernel style in this tree, and this 4-space indent
stuff was bothering me. The new results have some ugly bits, but
they're in places where we desperately want to be using helper
functions anyway.
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Eugeni Dodonov <eugeni@dodonov.net>
These will be used by intel_decode.c, and were taken from intel-gpu-tools.
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Eugeni Dodonov <eugeni@dodonov.net>
This will make these macros reusable from intel_decode.c, which
doesn't have a bufmgr_gem context, without faking the struct. We
should generally only be using these macros from bufmgr_gem context
setup anyway.
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Eugeni Dodonov <eugeni@dodonov.net>
This is from commit dd9a5b4f7f.
We've been sharing this file between that repo and Mesa, and it's time
to build a real interface using it. I'm also hoping to apply some of
its packet-walking logic for AUB dumping and batch validation
purposes.
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Eugeni Dodonov <eugeni@dodonov.net>
During free we unconditionally delete the bo from the vma cache. This
relies on the its list member being kept in a sane state. This fails
after the object is purged, as the purge operation performs a pure
deletion and doesn't reset the list member, leaving a pair of dangling
pointers.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Hopefully all the bugs in the callers have been found, so time to
handle the failures "gracefully" again.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As the max number of VMA mappings is a hard per-process limit, we need
to include the number of currently active mappings when evicting in
order to make room for a new mmap.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
There is a per-process limit on the number of vma that the process can
keep open, so we cannot keep an unlimited cache of unused vma's (besides
keeping track of all those vma in the kernel adds considerable overhead).
However, in order to work around inefficiencies in the kernel it is
beneficial to reuse the vma, so keep a MRU cache of vma.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As a precautionary measure munmap on buffer free so that we never leak
the vma. Also include a warning during debugging.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Otherwise we blow up on heavy tiled blitter loads (with giant
pixmaps).
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Before this, consumers of the libdrm API that might map a buffer
either way had to track which way was chosen at map time to call the
appropriate unmap. This relaxes that requirement by making
drm_intel_bo_unmap() always appropriate.
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
This lets us replace the current inner drawing loop of mesa:
for each prim {
compute bo list
if (check_aperture_space(bo list)) {
batch_flush()
compute bo list
if (check_aperture_space(bo list)) {
whine_about_batch_size()
fall back;
}
}
upload state to BOs
}
with this inner loop:
for each prim {
retry:
upload state to BOs
if (check_aperture_space(batch)) {
if (!retried) {
reset_to_last_prim()
batch_flush()
} else {
if (batch_flush())
whine_about_batch_size()
goto retry;
}
}
}
This avoids having to implement code to walk over certain sets of GL
state twice (the "compute bo list" step). While it's not a
performance improvement, it's a significant win in code complexity:
about -200 lines, and one place to make mistakes related to aperture
space instead of N places to forget some BO we should have included.
Note how if we do a reset in the new loop , we immediately flush. We
don't need to check aperture space -- the kernel will tell us if we
actually ran out of aperture or not. And if we did run out of
aperture, it's because either the single prim was too big, or because
check_aperture was wrong at the point of setting up the last
primitive.
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
A few of the bitfield-based booleans are left in place. Changing them
to "bool" results in the same code size, so I'm erring on the side of
not changing things.
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This is Fail.
First patch to libdrm, and I've borked it up.
Noticed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
... and if asked to open a bo by the same global name, return a fresh
reference to the previously allocated buffer.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
gen4+ hardware doesn't use fences for GPU access and the older kernel
doesn't expect userspace to make such a mistake. So don't.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=32190
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
For relaxed fencing the object may only consume the small set of active
pages, but still requires a fence region once bound into the aperture.
This is the size we need to use when computing the maximum possible
aperture space that could be used by a single batchbuffer and so avoid
hitting ENOSPC.
Reported-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Both the consumers of this API (sync objects and client throttling)
were expecting this behavior. The kernel used to actually behave the
desired (but incorrect) way for us anyway, but that got fixed a while
back.
If bufmgr.bo_mrb_exec is not set, drm_intel_bo_mrb_exec returns ENODEV
even though drm_intel_gem_bo_mrb_exec2 will work fine for the RENDER ring.
Fixes xf86-video-intel after commit 'add BLT ring support' (5bed685f76)
with kernels without BSD or BLT ring support (2.6.34 and before).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=31443
Signed-off-by: Albert Damen <albrt@gmx.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The intent of these was to catch mismatched map/unmap. What it
actually did was check whether there was ever a mapping of that type
(including in a previous life of the buffer through the userland BO
cache), not whether they were mismatched. We don't even actually want
to catch mismatched map/unmap, unless we also do refcounting, since at
one point Mesa would do map/map/use/unmap/unmap. Just remove this
code instead.
The kernel has always allowed userspace to underallocate objects
supplied for fencing. However, the kernel only allocated the object size
for the fence in the GTT and so caused tiling corruption. More recently
the kernel does allocate the full fence region in the GTT for an
under-sized object and so advertises that clients may finally make use
of this feature. The biggest benefit is for texture-heavy GL games on
i945 such as World of Padman which go from needing over 1GiB of RAM to
play to fitting in the GTT!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As the higher layers check the error return from libdrm-intel and
are supposed to handle the error (and print their own warning in
extremis) the voluminous output on stderr is just noise and a hazard in
its own right.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If the mapping succeeds we have a valid pointer. If setting the domain
failures we may incur cache corruption. However the usual failure mode
is because of a hung GPU, in which case it is preferable to ignore the
minor error from setting the domain and continue on oblivious. If
these errors persist, we should rate limit the warning [or even just
remove it].
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Mesa uses the returned pitch from alloc_tiled, so make sure that we set
it correctly before modifying the stride used for the SET_TILING call.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Ensure that the user doesn't attempt to specify a stride to use with a
linear buffer by forcing such to be zero.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
execbuffer() returns ENOSPC if it cannot fit the batch buffer into the
aperture which is the error we want to diagnose here.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Rearrange the cache cleanup so that we always scan following a final
unreference, and guard against multiple scans in a single second.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
When allocating a tiled buffer, if we remove the desired tiling mode due
to it being beyond hardware limits, also remove the stride. This ensures
that we only ever use stride 0 with I915_TILING_NONE.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we now expose a method to allocate tiled buffers, it makes more sense
to defer the SET_TILING until required. Besides the slim chance that it
will be a no-op, by delaying the change we are less likely to stall on
waiting for a bound buffer to release a fence register.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We need to inform the kernel if the tiling stride changes and not only
for changes of the tiling mode.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We had two cases recently where the rounding to powers of two hurt
badly: 4:2:0 YUV HD video frames would round up from 2.2MB to 4MB, and
Urban Terror was hitting aperture size limitations. For UT, this is
because mipmap trees for power of two texture sizes will land right in
the middle between two cache buckets.
By giving a few more sizes between powers of two, Urban Terror on my
945 ends up consuming 207MB of GEM objects instead of 272MB, and HD
video decode on Ironlake goes from 99MB to 75MB.
cairo-perf-diff of the benchmarks for gl and xlib shows a 1.09x and
1.06x speedup and a 1.07x, 1.08x, and 1.11x slowdown. From this, I
think this patch was really a no-op in terms of performance for these
CPU-bound workloads.
If the pitch is too large for the hardware to tile, recompute the
required surface size based on the untiled pitch and alignments. For the
older hardware, which has smaller limits and greater restrictions, this
may be a considerable saving in allocation size.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This introduces a new API to exec on BSD ring buffer, for H.264 VLD
decoding.
Signed-off-by: Xiang Hai hao <haihao.xiang@intel.com>
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Fixes:
Bug 26686 - Some textures are distorted with libdrm 2.4.18 in GTAVC>A3
http://bugs.freedesktop.org/show_bug.cgi?id=26686
This bug continues to haunt me. The kernel SET_TILING ioctl is
inconsistent in its return values when reporting an error. If one of its
sanity checks fail, then the input values are left unchanged. If the
kernel later fails to change the tiling mode, then the input values are
modified to match the current tiling on the object. In short, userspace
cannot trust the return values upon error and so we must assume that
upon error our current tiling mode matches reality and not update.
This reverts commit 7ca558494d.
This was pushed ahead of an essential review of bo level locking in
mesa, without which we cannot know whether removing this lock is safe.
Thomas tracked down this error with kdm and commit b509640:
==4320== Invalid write of size 8
==4320== at 0x9A97998: do_bo_emit_reloc (in /usr/lib/libdrm_intel.so.1.0.0)
==4320== by 0x9A97B9C: drm_intel_gem_bo_emit_reloc (in /usr/lib/libdrm_intel.so.1.0.0)
==4320== by 0xAED3234: intel_batchbuffer_emit_reloc (in /usr/lib/xorg/modules/dri/i965_dri.so)
==4320== by 0xAF13827: brw_emit_vertices (in /usr/lib/xorg/modules/dri/i965_dri.so)
==4320== by 0xAF1F14D: brw_upload_state (in /usr/lib/xorg/modules/dri/i965_dri.so)
==4320== by 0xAF12122: brw_draw_prims (in /usr/lib/xorg/modules/dri/i965_dri.so)
==4320== by 0xB256824: vbo_exec_vtx_flush (in /usr/lib/xorg/modules/dri/libdricore.so)
==4320== by 0xB2523BB: vbo_exec_FlushVertices_internal (in /usr/lib/xorg/modules/dri/libdricore.so)
==4320== by 0xB252411: vbo_exec_FlushVertices (in /usr/lib/xorg/modules/dri/libdricore.so)
==4320== by 0xB195A3D: _mesa_PopAttrib (in /usr/lib/xorg/modules/dri/libdricore.so)
==4320== by 0x8DF0F02: __glXDisp_Render (in /usr/lib/xorg/modules/extensions/libglx.xorg)
==4320== by 0x8DF517F: __glXDispatch (in /usr/lib/xorg/modules/extensions/libglx.xorg)
==4320== Address 0x126a8b80 is 0 bytes after a block of size 16,368 alloc'd
==4320== at 0x4C23E03: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==4320== by 0x9A97A64: do_bo_emit_reloc (in /usr/lib/libdrm_intel.so.1.0.0)
==4320== by 0x9A97B9C: drm_intel_gem_bo_emit_reloc (in /usr/lib/libdrm_intel.so.1.0.0)
==4320== by 0xAED3234: intel_batchbuffer_emit_reloc (in /usr/lib/xorg/modules/dri/i965_dri.so)
==4320== by 0xAF191DB: upload_binding_table_pointers (in /usr/lib/xorg/modules/dri/i965_dri.so)
==4320== by 0xAF1F14D: brw_upload_state (in /usr/lib/xorg/modules/dri/i965_dri.so)
==4320== by 0xAF12122: brw_draw_prims (in /usr/lib/xorg/modules/dri/i965_dri.so)
==4320== by 0xB255EF6: vbo_exec_DrawArrays (in /usr/lib/xorg/modules/dri/libdricore.so)
==4320== by 0x8DF67A3: __glXDisp_DrawArrays (in /usr/lib/xorg/modules/extensions/libglx.xorg)
==4320== by 0x8DF0F02: __glXDisp_Render (in /usr/lib/xorg/modules/extensions/libglx.xorg)
==4320== by 0x8DF517F: __glXDispatch (in /usr/lib/xorg/modules/extensions/libglx.xorg)
==4320== by 0x446293: ??? (in /usr/bin/Xorg)
which is simply due to only allocating space for the pointers and not
the structs themselves. D'oh.
Reported-by: Thomas Bächler <thomas@archlinux.org>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
intel_bufmgr.h is installed in ${includedir} directly, and the other
headers are taken care of by libdrm.pc's Cflags.
Signed-off-by: Julien Cristau <jcristau@debian.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
This is the largest untiled pitch requirement from gen2 through gen4.
It's only the case for gen3 rendering to color regions with depth, but
it's rare for this to be a significant factor in memory usage -- for
example, gen4 requires 1 or 2 times the element size, or up to 64
bytes depending on the size of the elements. This is easier than
encoding all the various little quirks for untiled pitch alignment,
since we rarely do untiled now.
intel_atomic.h includes very usefull atomic operations for
lock free parrallel access of variables. Moving these to
core libdrm for code sharing with radeon.
Signed-off-by: Pauli Nieminen <suokkos@gmail.com>
Ensure that errors from the kernel are propagated back to the caller,
and not masked with return 0;
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This allows Mesa to use drm_intel_bo_alloc_tiled() for its tiled
buffers, since it makes its decision about pitch before telling
libdrm. They happen to be the same choices for the tiled case.
This patch to libdrm adds support for the new execbuf2 ioctl. If
detected, it will be used instead of the old ioctl. By using the new
drm_intel_bufmgr_gem_enable_fenced_relocs(), you can indicate that any
time a fence register is actually required for a relocation target you
will call drm_intel_bo_emit_reloc_fence instead of
drm_intel_bo_emit_reloc, which will reduce fence register pressure.
Signed-off-by: Eric Anholt <eric@anholt.net>
The SET_TILING is pernicious in that it overwrites the input arguments
following an error in order to report the current tiling state of the
buffer. This caught us by surprise as we then fed those arguments back
into to the ioctl unmodified following an EINTR and so the kernel then
reported success for the no-op. We interpreted this success as meaning
that the tiling on the buffer had changed so updated our state and
started using the buffer incorrectly in the new tiled/untiled manner.
This lead to all sorts of random corruption and GPU hangs, even though
the batch buffers would look sane (when the GPU had not wandered off
into forbidden territory).
References:
Bug 25475 - [i915] Xorg crash / Execbuf while wedged
http://bugs.freedesktop.org/show_bug.cgi?id=25475
Bug 25554 - i830_uxa_prepare_access: gtt bo map failed: Input/output error
http://bugs.freedesktop.org/show_bug.cgi?id=25554
(And probably every other weird bug in the last few months.)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As the kernel reports the total number of fences, we must guess how many
fences are likely to be pinned. In the typical system these will be only
used by the scanout buffers, of which there may be one per pipe, and any
number of manually pinned fenced buffers. So take a conservative guess
and reserve two fences for use by the system.
Note this reduces the number of fences to 3 for i915 and prior.
Reference:
http://bugs.freedesktop.org/show_bug.cgi?id=25911
The latest intel driver 2.10.0 causes kernel oops and system hangs
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Don't store the error return in bo_gem->gtt_virtual or else we will
attempt to use that as a valid pointer in future mappings.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This removes the foremost prolific user of mutexes in libdrm_intel.so.
The other uses of the bufmgr_gem->mutex to serial access to individual
bos are currently required by Mesa, and are far less frequent.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[anholt: This chunk looks good...]
Acked-by: Eric Anholt <eric@anholt.net>
This has the unfortunate behaviour of releasing our malloc cache, but
the alternative is for X to consume a couple of gigabytes of ram and
die during testing. Fortunately the extra mallocs have little impact on
performance whereas avoiding swap and death, lots.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Instead of forcing the caller to check after every emit_reloc(), we can
flag the object as being in error, propagating that error upwards through
the relocation tree, and failing the eventual batch buffer execution.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
EAGAIN cannot be raised by the current code, but the system call maybe
interrupted and so return EINTR.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Hitting this error lead to a segfault:
intel_bufmgr_gem.c:919: Error mapping buffer 48607 (pixmap):
Cannot allocate memory.
because the errno was reused as the function return value after being
reset by the fprintf(), so caller thought the mapping had succeeded. The
convention established by libdrm is that the return value is the
negative errno and that uses of libdrm cannot trust the value of errno
afterwards, but must use the return code.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Buffers on the relocation tree are guarded by the reference to the batch
object and so do not need an extra reference whilst constructing the
list of execution buffer objects.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Having been bitten by a missing EINTR check during mmap_gtt(), I thought
it prudent to add some more protection around the ioctls.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This allows us to keep the assert added in the previous commit that we do
not modify the tree_reloc_size after inserting the buffer into a relocation
tree, which was being hit here:
#0 0xb78c2424 in __kernel_vsyscall ()
#1 0xb74f6401 in raise () from /lib/libc.so.6
#2 0xb74f7b42 in abort () from /lib/libc.so.6
#3 0xb74ef5a8 in __assert_fail () from /lib/libc.so.6
#4 0xb737e78b in drm_intel_bo_gem_set_in_aperture_size (bufmgr_gem=<value optimized out>, bo_gem=0x6) at intel_bufmgr_gem.c:373
#5 0xb737f519 in drm_intel_gem_bo_set_tiling (bo=0xa1030a0, tiling_mode=0xbff6c85c, stride=0) at intel_bufmgr_gem.c:1386
#6 0xb737f67f in drm_intel_gem_bo_unreference_final (bo=0xa1030a0, time=<value optimized out>) at intel_bufmgr_gem.c:768
#7 0xb737f5e3 in drm_intel_gem_bo_unreference_locked_timed (bo=0xa1e50d0, time=<value optimized out>) at intel_bufmgr_gem.c:805
#8 drm_intel_gem_bo_unreference_final (bo=0xa1e50d0, time=<value optimized out>) at intel_bufmgr_gem.c:756
#9 0xb737fcbb in drm_intel_gem_bo_unreference (bo=0xa1e50d0) at intel_bufmgr_gem.c:821
#10 0xb737b4e6 in drm_intel_bo_unreference (bo=0x0) at intel_bufmgr.c:80
#11 0xb7325625 in intel_batch_flush (scrn=0x9d91f78, flush=1) at i830_batchbuffer.c:200
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
For the older chipsets, i.e. pre-i965, which have severe alignment
restrictions for tiled buffers we need to pessimistically assume that we
will waste the size of buffer to meet those alignment constraints.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If the kernel immediately frees the backing store for a buffer when
marking it purgeable, then there is not point adding to the cache. Free
it immediately, instead.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>