OpenCL not functioning properly on MALI-T860

SUMMARY: cl.h causes a choke on John the Ripper and a hangs on Hashcat when executing clCreateKernel(program[gpu_id], "wpapsk_final_sha256", &ret_code) for wpapsk. How may I fix this problem -- especially/at the very least for Hashcat?

NOTE: This post is a clarification and elaboration of my earlier post (which incorrectly and insufficiently identified the problem with OpenCL for Mali):
Rockchip-mali-midgard: OpenCL broken; please add earlier compilation (version) to repository and fix the current one - Rockchip-mali-midgard: OpenCL broken; please add earlier compilation (version) to repository and fix the current one

To enable OpenCL (initially just for use with Hashcat), I have installed OpenCL ICD Loader, plus clinfo. Three additional packages I installed for John the Ripper, to use it for purposes of comparison.

apt-get install ocl-icd-libopencl1 clinfo libopenmpi2 openmpi-bin argon2
The following additional packages will be installed:
libgfortran3 libhwloc-plugins libhwloc5 libibverbs1 openmpi-common
apt-get install mesa-opencl-icd

When I ran JtR, I had an auspicious (4266c/s for whole GPU) start with Ubuntu Bionic's CL, from opencl-c-headers.
NOTE: any performance is acceptable, as this exercise is one of education and understanding on a tight budget.

john-1.9.0-jumbo-1/src$ …/run/john --test --format=opencl --force-scalar
Device 1@wifi: Mali-T860
Benchmarking: sha1crypt-opencl, (NetBSD) [PBKDF1-SHA1 OpenCL]… DONE
Speed for cost 1 (iteration count) of 64000 and 40000
Raw: 38.6 c/s real, 4266 c/s virtual
Later, I was able to achieve better results.
Bionic’s CL, from opencl-c-headers, works in ~3-5min.
john-1.9.0-jumbo-1-compiled_with_1bionic_headers/src$ …/run/john --test --format=opencl --force-scalar
Benchmarking: sha1crypt-opencl, (NetBSD) [PBKDF1-SHA1 OpenCL]… DONE
Speed for cost 1 (iteration count) of 64000 and 40000
Raw: 53.7 c/s real, 5120 c/s virtual
NOTE: speed doubled. (I accidentally ran the test again.)
john-1.9.0-jumbo-1-compiled_with_1bionic_headers/src$ …/run/john --test --format=opencl --force-scalar
Device 1@wifi: Mali-T860
Benchmarking: sha1crypt-opencl, (NetBSD) [PBKDF1-SHA1 OpenCL]… DONE
Speed for cost 1 (iteration count) of 64000 and 40000
Raw: 113 c/s real, 6400 c/s virtual

But I noticed that JtR choked on /at least/ one format: the one I need to use.

…/run/john --test --format=wpapsk-opencl
Device 1@wifi: Mali-T860
Benchmarking: wpapsk-opencl, WPA/WPA2/PMF/PMKID PSK [PBKDF2-SHA1 OpenCL]… 0: OpenCL CL_INVALID_PROGRAM_EXECUTABLE (-45) error in opencl_wpapsk_fmt_plug.c:270 - Error creating kernel

The problematic line (270) is the sixth (the one that begins "wpapsk_final_sha256 = clCreateKernel...) shown below.

~/JtR/john-1.9.0-jumbo-1/src$ grep clCreateKernel opencl_wpapsk_fmt_plug.c
crypt_kernel = wpapsk_init = clCreateKernel(program[gpu_id], “wpapsk_init”, &ret_code);
wpapsk_loop = clCreateKernel(program[gpu_id], “wpapsk_loop”, &ret_code);
wpapsk_pass2 = clCreateKernel(program[gpu_id], “wpapsk_pass2”, &ret_code);
wpapsk_final_md5 = clCreateKernel(program[gpu_id], “wpapsk_final_md5”, &ret_code);
wpapsk_final_sha1 = clCreateKernel(program[gpu_id], “wpapsk_final_sha1”, &ret_code);
wpapsk_final_sha256 = clCreateKernel(program[gpu_id], “wpapsk_final_sha256”, &ret_code); #JtR chokes here.
wpapsk_final_pmkid = clCreateKernel(program[gpu_id], “wpapsk_final_pmkid”, &ret_code);
The procedure clCreateKernel resides in cl.h as shown below
grep clCreateKernel /usr/include/CL/cl.h
clCreateKernel(cl_program /* program /,
clCreateKernelsInProgram(cl_program /
program */,

I replaced the directory CL with the same from ComputeLibrary-19.05/include/CL

mv /usr/include/CL /usr/include/CL.orig
cp -a ComputeLibrary-19.05/include/CL /usr/include/
but this change gave me the same results.
The first worked, albeit significantly more slowly than Bionic’s CL.
…/run/john --test --format=opencl --force-scalar
Device 1@wifi: Mali-T860
Benchmarking: sha1crypt-opencl, (NetBSD) [PBKDF1-SHA1 OpenCL]… DONE
Speed for cost 1 (iteration count) of 64000 and 40000
Raw: 33.7 c/s real, 2133 c/s virtual

wpapsk-opencl gave the error as before.
I posted my results to the JtR forum, but was unable to get a resolution.

Hashcat, like JtR (results omitted) recognized the GPU.

hashcat -I
hashcat (v5.1.0) starting…

OpenCL Info:

Platform ID #1
Vendor : ARM
Name : ARM Platform
Version : OpenCL 1.2 v1.r14p0-01rel0-git(966ed26).f44c85cb3d2ceb87e8be88e7592755c3

Device ID #1
Type : GPU
Vendor ID : 2147483648
Vendor : ARM
Name : Mali-T860
Version : OpenCL 1.2 v1.r14p0-01rel0-git(966ed26).f44c85cb3d2ceb87e8be88e7592755c3
Processor(s) : 4
Clock : 200
Memory : 229/919 MB allocatable
OpenCL Version : OpenCL C 1.2 v1.r14p0-01rel0-git(966ed26).f44c85cb3d2ceb87e8be88e7592755c3
Driver Version : 1.2

Upon issuing the benchmarking command the screen (but not the system) hung,

hashcat -b -m 2500

OpenCL Platform #1: ARM

  • Device #1: Mali-T860, 229/919 MB allocatable, 4MCU

    Hashmode: 2500 - WPA-EAPOL-PBKDF2 (Iterations: 4096)
    for 1/2 - 1hr sans results the first time (when I interrupted the process); for 4hrs ending in a spontaneous reboot, the second.

    Please, let me know how I can fix cl.h (or anything else) such that clCreateKernel(program[gpu_id], “wpapsk_final_sha256”, &ret_code) does not choke when processing wpapsk – especially/at the very least for Hashcat.

We have updated the wiki page and the mali package. OpenCL should be working now.

Thank you for your attention to the matter, but either you forgot to add the packages to the repository, or I am doing something incorrectly.

apt-get install rockchip-mali-midgard14

E: Unable to locate package rockchip-mali-midgard14

apt-cache search rockchip-mali-midgard14
returns nothing.
apt-cache search rockchip-mali-midgard-dev
likewise, even after
apt-get update && apt-get upgrade
which returns,
Hit:1 stretch InRelease
Hit:2 bionic InRelease
Hit:3 bionic InRelease
Hit:4 bionic-updates InRelease
Hit:5 bionic-backports InRelease
Hit:6 bionic-security InRelease
Reading package lists… Done
Reading package lists… Done
Building dependency tree
Reading state information… Done
Calculating upgrade… Done
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.

export DISTRO=bionic-testing
had no effect.

Please, be kind enough to add the new package to the repository.

Hit:3 bionic InRelease

The rockchip-mali-midgard14 package is new added to the testing repo. You need to also add the testing repo to your list.

deb bionic main

I could Install libmali package by running the following commands:
$ sudo apt-get install rockchip-mali-midgard14
$ sudo apt-get install rockchip-mali-midgard-dev
However, when I Checkout with clpeak by runing '$ cmake . -DCMAKE_CXX_COMPILER=g++, the following Error “-- Looking for CL_VERSION_2_0 - not found” occurs.

rock@rockpi:~/clpeak$ cmake . -DCMAKE_CXX_COMPILER=g++
– Setting build type to Release
– Looking for CL_VERSION_2_2
– Looking for CL_VERSION_2_2 - not found
– Looking for CL_VERSION_2_1
– Looking for CL_VERSION_2_1 - not found
– Looking for CL_VERSION_2_0
– Looking for CL_VERSION_2_0 - not found
– Looking for CL_VERSION_1_2
– Looking for CL_VERSION_1_2 - not found

rock@rockpi4c:~/clpeak$ ./clpeak 

Platform: ARM Platform
Device: Mali-T860
Driver version : 1.2 (Linux ARM64)
Compute units : 4
Clock frequency : 200 MHz

Global memory bandwidth (GBPS)
  float   : 5.14
  float2  : 6.22
  float4  : 8.27
  float8  : 6.54
  float16 : 6.00

Single-precision compute (GFLOPS)
  float   : 25.39
  float2  : 50.32
  float4  : 50.30
  float8  : 49.80
  float16 : 31.99

Half-precision compute (GFLOPS)
  half   : 50.87
  half2  : 100.77
  half4  : 99.26
  half8  : 98.59
  half16 : 95.15

Double-precision compute (GFLOPS)
  double   : 12.54
  double2  : 24.59
  double4  : 24.45
  double8  : 16.12
  double16 : 7.00

Integer compute (GIOPS)
  int   : 25.38
  int2  : 50.32
  int4  : 50.03
  int8  : 49.25
  int16 : 31.26

Integer compute Fast 24bit (GIOPS)
  int   : 25.32
  int2  : 50.34
  int4  : 50.04
  int8  : 49.28
  int16 : 31.27

Transfer bandwidth (GBPS)
  enqueueWriteBuffer              : 1.30
  enqueueReadBuffer               : 1.30
  enqueueWriteBuffer non-blocking : 1.30
  enqueueReadBuffer non-blocking  : 1.30
  enqueueMapBuffer(for read)      : 4.95
    memcpy from mapped ptr        : 2.37
  enqueueUnmap(after write)       : 5.31
    memcpy to mapped ptr          : 1.97

Kernel launch latency : 263.81 us