Commit Graph

15 Commits (0247b19dc0d61408fe3a5d2954468f28e6bfe200)

Author SHA1 Message Date
Cui, Flora 8c6dbd7938 tests/amdgpu: add deadlock test for sdma
deadlock test for sdma will cause gpu recoverty.
disable the test for now until GPU reset recovery could survive at least
1000 times test.

v2: add modprobe parameter

Reviewed-and-tested-by: Evan Quan <evan.quan@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Flora Cui <flora.cui@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-18 10:49:52 -05:00
Andrey Grodzovsky ba45adb2a1 amdgpu/test: Enable deadlock test for CI family (gfx7)
I retested GPU recovery with  Bonaire ASIC and it works.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
2018-12-11 15:41:06 -05:00
Andrey Grodzovsky 0be850441f amdgpu/test: Disable deadlock tests for all non gfx8/9 ASICs.
Since only for those ASICs gpu reset is enabled by deafult.
Also update disable message and fix identation .

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-02 16:22:49 -04:00
Andrey Grodzovsky b3dec018df amdgpu/test: Add illegal register and memory access test v2
Illegal access will cause CP hang followed by job timeout and
recovery kicking in.
Also, disable the suite for all APU ASICs until GPU
reset issues for them will be resolved and GPU reset recovery
will be enabled by default.

v2:
Add KV to deasbled APUs list and add comments regarding
necessary kernel amdgpu paramteres to run the tests.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-01 14:57:49 -04:00
Andrey Grodzovsky 21f1176458 amdgpu/test: Fix deadlock tests for AI and RV v2
Seems like AI and RV requires uncashed memory mapping to be able
to pickup value written to memory by CPU after the WAIT_REG_MEM
command was already launched.
.
Enable the test for AI and RV.

v2:
Update commit description.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2018-10-10 14:38:19 -04:00
Likun Gao cc472c5bb3 amdgpu: Disable deadlock test suite for RV
disable deadlock test suite for RV

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-16 13:45:20 -05:00
Eric Engestrom 0926f0af54 meson,configure: include config.h automatically
This will prevent any more missing `#include "config.h"` bug, at the
cost of having to recompile some files that didn't need to be when
changing build options.

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2018-03-20 18:19:26 +00:00
Eric Engestrom 80f33f4529 tests/amdgpu: drop unused variables
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2018-01-29 15:41:52 +00:00
Michel Dänzer 8e75f5a145 amdgpu: Disable deadlock test suite by default for SI ASICs
Hangs my Cape Verde.

Acked-by: Christian König <christian.koenig@amd.com>
2018-01-26 15:25:17 +01:00
Andrey Grodzovsky 429bb5820d amdgpu: Fix segfault in deadlock test.
If amdgpu_cs_query_fence_status terminates prematurely the BO
sometimes is unmapped before helper thread writes a vlaue
into it causing a segfault.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
2018-01-26 07:45:48 -05:00
Andrey Grodzovsky 5e239f3e3d amdgpu: Update deadlock test to not assert on ECANCELED
Kernel will abort jobs for guilty (causing GPU hang) context
with -ECANCELED don't assert if that the case.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2018-01-26 07:45:34 -05:00
Michel Dänzer 6fe93b8000 amdgpu: Don't dereference device_handle after amdgpu_device_deinitialize
Fixes use after free:

==2537== Invalid read of size 4
==2537==    at 0x1162C9: suite_deadlock_tests_enable (deadlock_tests.c:101)
==2537==    by 0x10B157: amdgpu_disable_suits (amdgpu_test.c:421)
==2537==    by 0x10B157: main (amdgpu_test.c:560)
==2537==  Address 0x5e44f24 is 452 bytes inside a block of size 1,016 free'd
==2537==    at 0x4C2BE1B: free (vg_replace_malloc.c:530)
==2537==    by 0x504CD8B: amdgpu_device_reference (amdgpu_device.c:164)
==2537==    by 0x504CD8B: amdgpu_device_deinitialize (amdgpu_device.c:307)
==2537==    by 0x1162BB: suite_deadlock_tests_enable (deadlock_tests.c:97)
==2537==    by 0x10B157: amdgpu_disable_suits (amdgpu_test.c:421)
==2537==    by 0x10B157: main (amdgpu_test.c:560)
==2537==  Block was alloc'd at
==2537==    at 0x4C2CC05: calloc (vg_replace_malloc.c:711)
==2537==    by 0x504CA5E: amdgpu_device_initialize (amdgpu_device.c:212)
==2537==    by 0x116298: suite_deadlock_tests_enable (deadlock_tests.c:93)
==2537==    by 0x10B157: amdgpu_disable_suits (amdgpu_test.c:421)
==2537==    by 0x10B157: main (amdgpu_test.c:560)

Reviewed-by: Christian König <christian.koenig@amd.com>
2018-01-16 16:57:57 +01:00
Andrey Grodzovsky 18ffe485cd amdgpu: Disable deadlock test suite for Vega 10
The test stalls the CP, until RCA is done the test is
disabled to not disrupt regression testing.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
2017-11-15 23:28:45 -05:00
Andrey Grodzovsky 806d080360 amdgpu: Use new suite/test disabling functionality.
Switch from disabling tests during run to using the new disable
API.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-11-13 18:20:22 +01:00
Andrey Grodzovsky 670db97dc3 amdgpu: Add deadlock detection test suit.
Adding initial tests for locks detection when SW
scheduler FIFO is full.

The test works by submitting a batch of identical commands which make the CP
stall waiting for condition to become true. The condition is later satisfied
form a helper thread. Other events that happen during this time
might create deadlock situations. One such example is GPU reset
triggered by this stall when  amdgpu_lockup_timeout != 0.

v2:
Increase the delay from 2 to 100 ms.
Comment out the compute test until it's working.
Typos fix.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-10-04 10:50:02 +02:00