OpenCL not functioning properly on MALI-T860

SUMMARY: cl.h causes a choke on John the Ripper and a hangs on Hashcat when executing clCreateKernel(program[gpu_id], "wpapsk_final_sha256", &ret_code) for wpapsk. How may I fix this problem -- especially/at the very least for Hashcat?

NOTE: This post is a clarification and elaboration of my earlier post (which incorrectly and insufficiently identified the problem with OpenCL for Mali):
Rockchip-mali-midgard: OpenCL broken; please add earlier compilation (version) to repository and fix the current one - Rockchip-mali-midgard: OpenCL broken; please add earlier compilation (version) to repository and fix the current one

To enable OpenCL (initially just for use with Hashcat), I have installed OpenCL ICD Loader, plus clinfo. Three additional packages I installed for John the Ripper, to use it for purposes of comparison.

apt-get install ocl-icd-libopencl1 clinfo libopenmpi2 openmpi-bin argon2
The following additional packages will be installed:
libgfortran3 libhwloc-plugins libhwloc5 libibverbs1 openmpi-common
apt-get install mesa-opencl-icd

When I ran JtR, I had an auspicious (4266c/s for whole GPU) start with Ubuntu Bionic's CL, from opencl-c-headers.
NOTE: any performance is acceptable, as this exercise is one of education and understanding on a tight budget.

john-1.9.0-jumbo-1/src$ …/run/john --test --format=opencl --force-scalar
Device 1@wifi: Mali-T860
Benchmarking: sha1crypt-opencl, (NetBSD) [PBKDF1-SHA1 OpenCL]… DONE
Speed for cost 1 (iteration count) of 64000 and 40000
Raw: 38.6 c/s real, 4266 c/s virtual
Later, I was able to achieve better results.
Bionic’s CL, from opencl-c-headers, works in ~3-5min.
john-1.9.0-jumbo-1-compiled_with_1bionic_headers/src$ …/run/john --test --format=opencl --force-scalar
Benchmarking: sha1crypt-opencl, (NetBSD) [PBKDF1-SHA1 OpenCL]… DONE
Speed for cost 1 (iteration count) of 64000 and 40000
Raw: 53.7 c/s real, 5120 c/s virtual
NOTE: speed doubled. (I accidentally ran the test again.)
john-1.9.0-jumbo-1-compiled_with_1bionic_headers/src$ …/run/john --test --format=opencl --force-scalar
Device 1@wifi: Mali-T860
Benchmarking: sha1crypt-opencl, (NetBSD) [PBKDF1-SHA1 OpenCL]… DONE
Speed for cost 1 (iteration count) of 64000 and 40000
Raw: 113 c/s real, 6400 c/s virtual

But I noticed that JtR choked on /at least/ one format: the one I need to use.

…/run/john --test --format=wpapsk-opencl
Device 1@wifi: Mali-T860
Benchmarking: wpapsk-opencl, WPA/WPA2/PMF/PMKID PSK [PBKDF2-SHA1 OpenCL]… 0: OpenCL CL_INVALID_PROGRAM_EXECUTABLE (-45) error in opencl_wpapsk_fmt_plug.c:270 - Error creating kernel

The problematic line (270) is the sixth (the one that begins "wpapsk_final_sha256 = clCreateKernel...) shown below.

~/JtR/john-1.9.0-jumbo-1/src$ grep clCreateKernel opencl_wpapsk_fmt_plug.c
crypt_kernel = wpapsk_init = clCreateKernel(program[gpu_id], “wpapsk_init”, &ret_code);
wpapsk_loop = clCreateKernel(program[gpu_id], “wpapsk_loop”, &ret_code);
wpapsk_pass2 = clCreateKernel(program[gpu_id], “wpapsk_pass2”, &ret_code);
wpapsk_final_md5 = clCreateKernel(program[gpu_id], “wpapsk_final_md5”, &ret_code);
wpapsk_final_sha1 = clCreateKernel(program[gpu_id], “wpapsk_final_sha1”, &ret_code);
wpapsk_final_sha256 = clCreateKernel(program[gpu_id], “wpapsk_final_sha256”, &ret_code); #JtR chokes here.
wpapsk_final_pmkid = clCreateKernel(program[gpu_id], “wpapsk_final_pmkid”, &ret_code);
The procedure clCreateKernel resides in cl.h as shown below
grep clCreateKernel /usr/include/CL/cl.h
clCreateKernel(cl_program /* program /,
clCreateKernelsInProgram(cl_program /
program */,

I replaced the directory CL with the same from ComputeLibrary-19.05/include/CL

mv /usr/include/CL /usr/include/CL.orig
cp -a ComputeLibrary-19.05/include/CL /usr/include/
but this change gave me the same results.
The first worked, albeit significantly more slowly than Bionic’s CL.
…/run/john --test --format=opencl --force-scalar
Device 1@wifi: Mali-T860
Benchmarking: sha1crypt-opencl, (NetBSD) [PBKDF1-SHA1 OpenCL]… DONE
Speed for cost 1 (iteration count) of 64000 and 40000
Raw: 33.7 c/s real, 2133 c/s virtual

wpapsk-opencl gave the error as before.
I posted my results to the JtR forum, but was unable to get a resolution.

Hashcat, like JtR (results omitted) recognized the GPU.

hashcat -I
hashcat (v5.1.0) starting…

OpenCL Info:

Platform ID #1
Vendor : ARM
Name : ARM Platform
Version : OpenCL 1.2 v1.r14p0-01rel0-git(966ed26).f44c85cb3d2ceb87e8be88e7592755c3

Device ID #1
Type : GPU
Vendor ID : 2147483648
Vendor : ARM
Name : Mali-T860
Version : OpenCL 1.2 v1.r14p0-01rel0-git(966ed26).f44c85cb3d2ceb87e8be88e7592755c3
Processor(s) : 4
Clock : 200
Memory : 229/919 MB allocatable
OpenCL Version : OpenCL C 1.2 v1.r14p0-01rel0-git(966ed26).f44c85cb3d2ceb87e8be88e7592755c3
Driver Version : 1.2

Upon issuing the benchmarking command the screen (but not the system) hung,

hashcat -b -m 2500

OpenCL Platform #1: ARM

  • Device #1: Mali-T860, 229/919 MB allocatable, 4MCU

    Hashmode: 2500 - WPA-EAPOL-PBKDF2 (Iterations: 4096)
    for 1/2 - 1hr sans results the first time (when I interrupted the process); for 4hrs ending in a spontaneous reboot, the second.

    Please, let me know how I can fix cl.h (or anything else) such that clCreateKernel(program[gpu_id], “wpapsk_final_sha256”, &ret_code) does not choke when processing wpapsk – especially/at the very least for Hashcat.

We have updated the wiki page and the mali package. OpenCL should be working now.

https://wiki.radxa.com/Rockpi4/dev/install-opencl

Thank you for your attention to the matter, but either you forgot to add the packages to the repository, or I am doing something incorrectly.

apt-get install rockchip-mali-midgard14

E: Unable to locate package rockchip-mali-midgard14

apt-cache search rockchip-mali-midgard14
returns nothing.
apt-cache search rockchip-mali-midgard-dev
likewise, even after
apt-get update && apt-get upgrade
which returns,
Hit:1 http://repo.linaro.org/ubuntu/linaro-overlay stretch InRelease
Hit:2 http://ports.ubuntu.com/ubuntu-ports bionic InRelease
Hit:3 http://apt.radxa.com/bionic bionic InRelease
Hit:4 http://ports.ubuntu.com/ubuntu-ports bionic-updates InRelease
Hit:5 http://ports.ubuntu.com/ubuntu-ports bionic-backports InRelease
Hit:6 http://ports.ubuntu.com/ubuntu-ports bionic-security InRelease
Reading package lists… Done
Reading package lists… Done
Building dependency tree
Reading state information… Done
Calculating upgrade… Done
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.

Addendum:
export DISTRO=bionic-testing
had no effect.

Please, be kind enough to add the new package to the repository.

Hit:3 http://apt.radxa.com/bionic bionic InRelease

The rockchip-mali-midgard14 package is new added to the testing repo. You need to also add the testing repo to your list.

deb http://apt.radxa.com/bionic-testing/ bionic main

I could Install libmali package by running the following commands:
$ sudo apt-get install rockchip-mali-midgard14
$ sudo apt-get install rockchip-mali-midgard-dev
However, when I Checkout with clpeak by runing '$ cmake . -DCMAKE_CXX_COMPILER=g++, the following Error “-- Looking for CL_VERSION_2_0 - not found” occurs.

rock@rockpi:~/clpeak$ cmake . -DCMAKE_CXX_COMPILER=g++
– Setting build type to Release
– Looking for CL_VERSION_2_2
– Looking for CL_VERSION_2_2 - not found
– Looking for CL_VERSION_2_1
– Looking for CL_VERSION_2_1 - not found
– Looking for CL_VERSION_2_0
– Looking for CL_VERSION_2_0 - not found
– Looking for CL_VERSION_1_2
– Looking for CL_VERSION_1_2 - not found

rock@rockpi4c:~/clpeak$ ./clpeak 

Platform: ARM Platform
Device: Mali-T860
Driver version : 1.2 (Linux ARM64)
Compute units : 4
Clock frequency : 200 MHz

Global memory bandwidth (GBPS)
  float   : 5.14
  float2  : 6.22
  float4  : 8.27
  float8  : 6.54
  float16 : 6.00

Single-precision compute (GFLOPS)
  float   : 25.39
  float2  : 50.32
  float4  : 50.30
  float8  : 49.80
  float16 : 31.99

Half-precision compute (GFLOPS)
  half   : 50.87
  half2  : 100.77
  half4  : 99.26
  half8  : 98.59
  half16 : 95.15

Double-precision compute (GFLOPS)
  double   : 12.54
  double2  : 24.59
  double4  : 24.45
  double8  : 16.12
  double16 : 7.00

Integer compute (GIOPS)
  int   : 25.38
  int2  : 50.32
  int4  : 50.03
  int8  : 49.25
  int16 : 31.26

Integer compute Fast 24bit (GIOPS)
  int   : 25.32
  int2  : 50.34
  int4  : 50.04
  int8  : 49.28
  int16 : 31.27

Transfer bandwidth (GBPS)
  enqueueWriteBuffer              : 1.30
  enqueueReadBuffer               : 1.30
  enqueueWriteBuffer non-blocking : 1.30
  enqueueReadBuffer non-blocking  : 1.30
  enqueueMapBuffer(for read)      : 4.95
    memcpy from mapped ptr        : 2.37
  enqueueUnmap(after write)       : 5.31
    memcpy to mapped ptr          : 1.97

Kernel launch latency : 263.81 us