Switch the logic to only disable the tests for asics which don't
have GPU reset support. This way we don't need to update it
every time we add a new asic which does support it.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Combined with -Wundef (added in 75758d2ccf & enforced in ba17673eed),
this provides absolute safety against #ifdef typos.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
deadlock test for sdma will cause gpu recoverty.
disable the test for now until GPU reset recovery could survive at least
1000 times test.
v2: add modprobe parameter
Reviewed-and-tested-by: Evan Quan <evan.quan@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Flora Cui <flora.cui@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Since only for those ASICs gpu reset is enabled by deafult.
Also update disable message and fix identation .
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Illegal access will cause CP hang followed by job timeout and
recovery kicking in.
Also, disable the suite for all APU ASICs until GPU
reset issues for them will be resolved and GPU reset recovery
will be enabled by default.
v2:
Add KV to deasbled APUs list and add comments regarding
necessary kernel amdgpu paramteres to run the tests.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Seems like AI and RV requires uncashed memory mapping to be able
to pickup value written to memory by CPU after the WAIT_REG_MEM
command was already launched.
.
Enable the test for AI and RV.
v2:
Update commit description.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
disable deadlock test suite for RV
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This will prevent any more missing `#include "config.h"` bug, at the
cost of having to recompile some files that didn't need to be when
changing build options.
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
If amdgpu_cs_query_fence_status terminates prematurely the BO
sometimes is unmapped before helper thread writes a vlaue
into it causing a segfault.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Kernel will abort jobs for guilty (causing GPU hang) context
with -ECANCELED don't assert if that the case.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Fixes use after free:
==2537== Invalid read of size 4
==2537== at 0x1162C9: suite_deadlock_tests_enable (deadlock_tests.c:101)
==2537== by 0x10B157: amdgpu_disable_suits (amdgpu_test.c:421)
==2537== by 0x10B157: main (amdgpu_test.c:560)
==2537== Address 0x5e44f24 is 452 bytes inside a block of size 1,016 free'd
==2537== at 0x4C2BE1B: free (vg_replace_malloc.c:530)
==2537== by 0x504CD8B: amdgpu_device_reference (amdgpu_device.c:164)
==2537== by 0x504CD8B: amdgpu_device_deinitialize (amdgpu_device.c:307)
==2537== by 0x1162BB: suite_deadlock_tests_enable (deadlock_tests.c:97)
==2537== by 0x10B157: amdgpu_disable_suits (amdgpu_test.c:421)
==2537== by 0x10B157: main (amdgpu_test.c:560)
==2537== Block was alloc'd at
==2537== at 0x4C2CC05: calloc (vg_replace_malloc.c:711)
==2537== by 0x504CA5E: amdgpu_device_initialize (amdgpu_device.c:212)
==2537== by 0x116298: suite_deadlock_tests_enable (deadlock_tests.c:93)
==2537== by 0x10B157: amdgpu_disable_suits (amdgpu_test.c:421)
==2537== by 0x10B157: main (amdgpu_test.c:560)
Reviewed-by: Christian König <christian.koenig@amd.com>
The test stalls the CP, until RCA is done the test is
disabled to not disrupt regression testing.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Switch from disabling tests during run to using the new disable
API.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Adding initial tests for locks detection when SW
scheduler FIFO is full.
The test works by submitting a batch of identical commands which make the CP
stall waiting for condition to become true. The condition is later satisfied
form a helper thread. Other events that happen during this time
might create deadlock situations. One such example is GPU reset
triggered by this stall when amdgpu_lockup_timeout != 0.
v2:
Increase the delay from 2 to 100 ms.
Comment out the compute test until it's working.
Typos fix.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>