Illegal access will cause CP hang followed by job timeout and
recovery kicking in.
Also, disable the suite for all APU ASICs until GPU
reset issues for them will be resolved and GPU reset recovery
will be enabled by default.
v2:
Add KV to deasbled APUs list and add comments regarding
necessary kernel amdgpu paramteres to run the tests.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Seems like AI and RV requires uncashed memory mapping to be able
to pickup value written to memory by CPU after the WAIT_REG_MEM
command was already launched.
.
Enable the test for AI and RV.
v2:
Update commit description.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
disable deadlock test suite for RV
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This will prevent any more missing `#include "config.h"` bug, at the
cost of having to recompile some files that didn't need to be when
changing build options.
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
If amdgpu_cs_query_fence_status terminates prematurely the BO
sometimes is unmapped before helper thread writes a vlaue
into it causing a segfault.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Kernel will abort jobs for guilty (causing GPU hang) context
with -ECANCELED don't assert if that the case.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Fixes use after free:
==2537== Invalid read of size 4
==2537== at 0x1162C9: suite_deadlock_tests_enable (deadlock_tests.c:101)
==2537== by 0x10B157: amdgpu_disable_suits (amdgpu_test.c:421)
==2537== by 0x10B157: main (amdgpu_test.c:560)
==2537== Address 0x5e44f24 is 452 bytes inside a block of size 1,016 free'd
==2537== at 0x4C2BE1B: free (vg_replace_malloc.c:530)
==2537== by 0x504CD8B: amdgpu_device_reference (amdgpu_device.c:164)
==2537== by 0x504CD8B: amdgpu_device_deinitialize (amdgpu_device.c:307)
==2537== by 0x1162BB: suite_deadlock_tests_enable (deadlock_tests.c:97)
==2537== by 0x10B157: amdgpu_disable_suits (amdgpu_test.c:421)
==2537== by 0x10B157: main (amdgpu_test.c:560)
==2537== Block was alloc'd at
==2537== at 0x4C2CC05: calloc (vg_replace_malloc.c:711)
==2537== by 0x504CA5E: amdgpu_device_initialize (amdgpu_device.c:212)
==2537== by 0x116298: suite_deadlock_tests_enable (deadlock_tests.c:93)
==2537== by 0x10B157: amdgpu_disable_suits (amdgpu_test.c:421)
==2537== by 0x10B157: main (amdgpu_test.c:560)
Reviewed-by: Christian König <christian.koenig@amd.com>
The test stalls the CP, until RCA is done the test is
disabled to not disrupt regression testing.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Switch from disabling tests during run to using the new disable
API.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Adding initial tests for locks detection when SW
scheduler FIFO is full.
The test works by submitting a batch of identical commands which make the CP
stall waiting for condition to become true. The condition is later satisfied
form a helper thread. Other events that happen during this time
might create deadlock situations. One such example is GPU reset
triggered by this stall when amdgpu_lockup_timeout != 0.
v2:
Increase the delay from 2 to 100 ms.
Comment out the compute test until it's working.
Typos fix.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>