Radxa Dragon Q6A: Dragon Q6A Ubuntu 24.04: QNN HTP backend fails with qnn_open 0x80000600

Board: Radxa Dragon Q6A (QCS6490)
OS: Ubuntu 24.04 / Armbian
Kernel: 6.18.2-current-qcs6490

I was able to get QNNExecutionProvider and QAI AppBuilder working on Ubuntu 24.04 / kernel 6.18 on the Dragon Q6A.

The following components appear functional:

  • QNNExecutionProvider loads correctly

  • QAI AppBuilder imports successfully

The below are present and accessible.

  • /dev/fastrpc-adsp

  • /dev/fastrpc-cdsp

  • /dev/fastrpc-cdsp-secure

HTP initialization starts successfully, but DSP session attach fails with:

DspTransport.openSession qnn_open failed, 0x80000600
Failed to load skel
QNN_DEVICE_ERROR_INVALID_CONFIG

Current runtime reports:

AISW_VERSION: 2.40.0

Radxa team:
Could you please clarify:

  1. Which exact QAIRT/QNN SDK version is validated for the official Ubuntu image?

  2. Is there a required userspace/runtime version matching the shipped DSP firmware?

  3. Are there additional fastrpc / adsprpc packages or init services required on Ubuntu?

  4. Is the HTP/NPU currently validated on kernel 6.18, or only on the official Radxa kernel branch?

At this point provider loading, AppBuilder import, fastrpc devices, and firmware detection all work correctly, so this appears to be a DSP runtime compatibility issue rather than a missing-driver problem.

I tried multiple ways to get it to work, but I got other issues with the kernel (DMA not found errors, etc)

Update after a lot of tinkering:

The HTP/NPU inference on Radxa Dragon Q6A (QCS6490) seems to be working under Armbian Ubuntu 24.04 without DTB/kernel/CMA modifications. (Though in a very fragile state), breaking sometimes. Would need some more help/clarity from Radxa team and Qualcomm team.

Environment:

- Board: Radxa Dragon Q6A (QCS6490)

- OS: Armbian 26.2.4 Noble

- Base distro: Ubuntu 24.04 LTS (Noble Numbat)

- Kernel:

Linux 6.18.2-current-qcs6490

- Architecture: aarch64

- QAIRT/QNN SDK: 2.46.0

- Runtime: onnxruntime + QNNExecutionProvider

Problem:

HTP execution repeatedly failed with:

qnn_open failed, 0x80000406

Initial investigation incorrectly suggested:

- unsupported SoC

- missing DMA reservation

- CMA/DTB issues

Actual root cause:

1. DSP-side missing libc++.so.1 dependency

2. Version mismatch between host QNN runtime and DSP skeletons

The DSP rejected libQnnHtpV68Skel.so during FASTRPC_IOCTL_INVOKE because:

- libc++.so.1 was not available in the DSP search path

- host runtime (2.46) and DSP skeleton (2.42) did not match

Fix:

1. Copy matching V2.46 libQnnHtpV68Skel.so into:

/usr/lib/rfsa/adsp/cdsp/

2. Create symlink:

libc++.so.1 → /usr/share/qcom/qcs6490/radxa/dragon-q6a/dsp/cdsp/libc++.so.1

Result:

- qnn-net-run successfully executes graphs on HTP

- VTCM allocation succeeds

- Python onnxruntime QNNExecutionProvider works natively on ARM64

- Verified DSP offload achieved

Important:

No DTB changes, CMA tuning, kernel recompilation, or signed skeleton/testsig workflow were required.

1 Like

Do you have the full command set for us to reproduce the issue on our system?

Hi Yuntian,

I reconstructed the approximate working sequence from my logs/history. The issue appeared related to DSP firmware/runtime consistency and DSP-side library resolution.

Environment:

  • Radxa Dragon Q6A (QCS6490)

  • Armbian 26.2.4 Noble

  • Ubuntu 24.04

  • Kernel: 6.18.2-current-qcs6490

  • QNN SDK: 2.46.0

Steps that resolved the issue on my setup:

  1. Install/extract QAIRT SDK 2.46.0 and source envsetup.sh

  2. Replace DSP firmware manually:

  • Backup existing files from:
    /lib/firmware/qcom/qcs6490/radxa/dragon-q6a/

  • Replace adsp.mbn and cdsp.mbn with matching versions

  1. Copy matching libQnnHtpV68Skel.so into:
    /usr/lib/rfsa/adsp/cdsp/

  2. Create DSP-side symlinks:

  • libc++.so.1

  • libc++abi.so.1

pointing to:
/usr/share/qcom/qcs6490/radxa/dragon-q6a/dsp/cdsp/

  1. Set environment variables:
  • ADSP_LIBRARY_PATH

  • LD_LIBRARY_PATH

to include:
/usr/lib/rfsa/adsp/cdsp/

  1. Restart CDSP remoteproc and rerun inference

After this:

  • qnn-net-run succeeded on HTP

  • VTCM allocation succeeded

  • ONNX Runtime QNNExecutionProvider initialized correctly

  • DSP offload was verified

No DTB changes, CMA tuning, or kernel recompilation were required in my case.

One remaining observation:

Some ONNX Runtime/QNNExecutionProvider workloads still appear to partially fall back to CPU depending on model/operator support.

Could you please check/verify this or pass it on so this can potentially be improved in future updates?

@Morgan can you check this issue?

Hi, @Rahul007

it seem the fastrpc is broken, please send the following commands output to check

ls /dev/fast*
ls /usr/lib/dsp
sudo apt install fastrtpc-test
fastrpc_test

best,
Morgan

Hi @Morgan

The FastRPC/CDSP stack now appears healthy on my setup.

Verified (please refer to the raw terminal output below):

  • /dev/fastrpc-* present
  • fastrpc_test passes completely (3/3 PASSED)
  • DSP execution on domain 3 verified\
  • shared memory allocation works
  • calculator + multithreading DSP tests pass
  • qnn-net-run on HTP succeeds

So the DSP/HTP infrastructure itself seems functional.

The remaining issue is specifically transformer LLM inference.
I was previously unable to get the vision models to work, but now that also seems to be working fine.

Now, my main question is:

Does Qualcomm’s Linux software stack on QCS6490 currently support practical transformer LLM inference on the HTP/NPU?

For example:

  • Qwen
  • Llama
  • Gemma, etc

So far I have not been able to achieve meaningful LLM inference acceleration through the HTP/NPU backend on Linux, despite the lower-level DSP/HTP components working correctly.
Everything always falls back to the CPU.

Are there any:

  • Recommended runtimes/workflows for LLM inference (preferably 3-7b models)
  • Supported model formats or conversion paths
  • Validated examples for LLM inference on QCS6490 Linux?

Thanks,
Rahul

Here is the raw output from the terminal for the tests:

radxa@radxa-dragon-q6a:/$ ls /dev/fast*
ls /usr/lib/dsp
sudo apt install fastrpc-test
fastrpc_test

/dev/fastrpc-adsp  /dev/fastrpc-cdsp  /dev/fastrpc-cdsp-secure
adsp  cdsp  fastrpc_shell_unsigned_3  libcalculator.so  libhap_example.so  libmultithreading.so  libQnnHtpV68Skel.so

Reading package lists… Done
Building dependency tree… Done
Reading state information… Done
fastrpc-test is already the newest version (1.0.4-1).
The following packages were automatically installed and are no longer required:
snapd squashfs-tools
Use ā€˜sudo apt autoremove’ to remove them.
0 upgraded, 0 newly installed, 0 to remove and 22 not upgraded.

Demonstrating FARF run-time logging

hap_example function PASSED
Please look at the mini-dm logs or the adb logcat logs for DSP output

Demonstrating HAP_mem.h APIs

hap_example function PASSED
Please look at the mini-dm logs or the adb logcat logs for DSP output

Demonstrating HAP_perf.h APIs

hap_example function PASSED
Please look at the mini-dm logs or the adb logcat logs for DSP output
[PASS] libhap_example.so

Allocate 4000 bytes from ION heap
Creating sequence of numbers from 0 to 999
Compute sum on domain 3

Call calculator_sum on the DSP
Sum = 499500

Call calculator_max on the DSP
Max value = 999
[PASS] libcalculator.so

Test PASSED
Please look at the mini-dm logs or the adb logcat logs for DSP output
[PASS] libmultithreading.so

========================================
Test Summary:
Total tests run:    3
Passed:             3
Failed:             0
Skipped:            0

RESULT: All applicable tests PASSED
radxa@radxa-dragon-q6a:/$
1 Like