OpenCL SHA-512

John can crack crypt SHA-512 on OpenCL enabled devices. To use it, type:
john –format=sha512crypt-opencl [other options]

All available GPU power is used while John is running, so the computer can become less responsive, especially if the GPU is used to control your monitor.
The hint is: if your computer seems to hang and you have only one GPU, your X Server is busy and you can't do anything. Just wait 5 or 10 minutes. If nothing happens, reboot your computer. I saw cases on Radeon HD 6770 that after some minutes waiting everything comes to normality.

The compilation process on Radeon HD 6770 takes almost a minute; look at the line “Elapsed time” below. Poorer hardware can behaves even badly.

OpenCL platform 0: AMD Accelerated Parallel Processing, 2 device(s).
Using device 0: Juniper
Building the kernel, this could take a while
Compilation log: LOOP UNROLL: pragma unroll (line 117)
    Unrolled as requested!

Elapsed time: 402 seconds
Local work size (LWS) 32, global work size (GWS) 2560
Benchmarking: sha512crypt (rounds=5000) [OpenCL]... DONE
Raw:	1387 c/s real, 512000 c/s virtual

Despite of that, you can do real crack on your GPU, even if you use it as your graphics controller. Although, you might experience some slowness.

The maximum password lenght is: 24 bytes.
The maximum salt lenght is: 16 bytes.

In order to change it, anyone can open the opencl_cryptsha512.h, modify the statements seen below and recompile the project. Note that some memory mis-alignment could lead to wrong computation results. Note that this is not recommended.

#define SALT_SIZE               16
#define PLAINTEXT_LENGTH        24
  1. This software was tested using (see hardware and software details below):
    • Board: C-01. On S-01 and S-02 software.
    • Board: C-03. On S-01 and S-02 software.
    • Board: C-02, C-04 and C-05 on Ubuntu X86_64.
    1. Not sure how it will behave under different hardware/software.
  2. The source code seems to be better suited to Southern Islands products. See section 4.16 in Optimization Guidelines for Southern Islands GPUs [1].

Set Up

John can try to figure out what is the best configuration to use in the running hardware. In this set up mode, John will make some benchmark and take some measures. When finished, it will analyse the results and select the best configuration. Although it is not deterministic, the results (at least) give good hints about the best configuration.

There are 3 environment variables that could be set to configure John's behavior.

  • LWS: local work size. Define workgroup/warp size. It means how (many) items are going to be grouped together.
  • GWS: global work size. Define how many candidate keys are going to be sent to hardware at once. To hide latency, at least a few thousands should be picked.
  • STEP. Optional, default is 512. Define the step size that should be used in order to find the best GWS. It is especially useful on poor or very good cards (where default value might be sub-optimal), when set up takes too much time to finish or while experimenting. This variable is only used while probing GWS.
  • Also, the user can set STEP=0. In this case, starting on 512, the GWS values will be multiplied by 2. In this situation, the search is going to be faster, but, in today's hardware, the best value could be missed.
  • It is a good idea to use only power of two in LWS, GWS and STEP.

To access the auto-configuration mode, the user should set LWS and/or GWS to zero (LWS=0 and/or GWS=0). More information at “One should” section below.


LWS=0 GWS=5120 ./john -fo:sha512crypt-opencl -t
LWS=0 GWS=0 STEP=256 ./john -fo:sha512crypt-opencl -t
LWS=32 GWS=0 STEP=1024 ./john -fo:sha512crypt-opencl -t
LWS=32 GWS=0 STEP=0 ./john -fo:sha512crypt-opencl -t

One should:

  1. Pay special attention to how the work is going to be divided/organized. It means that some time should be spent to select the best work-group size and threads number (LWS and GWS sizes). A good LWS+GWS set can make the performance increases an order of magnitude.
  2. LWS auto-configuration mode may give you less than optimal results. To assure this is not your case, we recommend you to try, at least, LWS=32 and LWS=64.
  3. A safe start point could be: LWS=32 and GWS=512. Test it using the command below:
LWS=32 GWS=512 ./john -fo:sha512crypt-opencl -t

My default first try on new hardware is:

LWS=32 GWS=0 STEP=512 ./john -fo:sha512crypt-opencl -t

===== Configuration ===== ==== Tested Hardware ==== ID : processor, memory * C-01: AMD Phenom™ II X6 1075T Processor × 6, 4GB DDR3 1600MHz * C-02: AMD FX™-8120 Eight-Core Processor * C-03: AMD Radeon HD 6770 (Juniper) * C-04: AMD Radeon HD 7970 (Tahiti) * C-05: GeForce GTX 570 ==== Tested Software ==== * S-01: * SDK 2.6 OpenCL 1.1 AMD-APP (898.1) * Catalyst 12.2 * Driver 8.95-120214a-134397C-ATI * Ubuntu 11.10 x86_64 * S-02: * SDK 2.6 OpenCL 1.2 AMD-APP (923.1) * Catalyst 12.2 * Driver 8.95-120214a-134397C-ATI * Ubuntu 12.04 x86_64 ==== Benchmarks ==== ID : configuration. Operating system. * C-01 : Local work size (LWS) 1, global work size (GWS) 2560. S-02. * Raw: 2056 c/s real, 342 c/s virtual * C-02 : Local work size (LWS) 1, global work size (GWS) 1024. * Raw: 1920 c/s real, 239 c/s virtual * C-03. S-02. * Local worksize (LWS) 64, global worksize (GWS) 10240 * Benchmarking: sha512crypt (rounds=5000) [OpenCL]… DONE * Raw: 6301 c/s real, 256000 c/s virtual * C-04. * Local work size (LWS) 64, global work size (GWS) 81920 * Benchmarking: sha512crypt (rounds=5000) [OpenCL]… DONE * Raw: 13451 c/s real, 1638K c/s virtual * C-05. * Local worksize (LWS) 512, global worksize (GWS) 15360 * Benchmarking: sha512crypt (rounds=5000) [OpenCL]… DONE * Raw: 13128 c/s real, 13128 c/s virtual Reference (march 2012)
⇒ John Jumbo on CPU C-01 (1 core)
Benchmarking: crypt SHA-512 (rounds=5000) [OpenSSL 64/64]… DONE
Raw: 440 c/s real, 440 c/s virtual
⇒ John Jumbo with OMP on CPU C-01 (6 cores)
Benchmarking: crypt SHA-512 (rounds=5000) [OpenSSL 64/64]… (6xOMP) DONE
Raw: 2254 c/s real, 378 c/s virtual
⇒ John Jumbo crypt(3) with OMP on CPU C-01 (6 cores)
Benchmarking: generic crypt(3) SHA-512 rounds=5000 [?/64]… (6xOMP) DONE
Many salts: 1617 c/s real, 273 c/s virtual
Only one salt: 1617 c/s real, 274 c/s virtual
⇒ John Jumbo on CPU C-02 (1 core)
Benchmarking: crypt SHA-512 (rounds=5000) [OpenSSL 64/64]… DONE
Raw: 367 c/s real, 365 c/s virtual
⇒ John Jumbo with OMP on CPU C-02 (8 cores)
Benchmarking: crypt SHA-512 (rounds=5000) [OpenSSL 64/64]… (8xOMP) DONE
Raw: 2055 c/s real, 257 c/s virtual
⇒ John Jumbo crypt(3) with OMP on CPU C-02 (8 cores)
Benchmarking: generic crypt(3) SHA-512 rounds=5000 [?/64]… (8xOMP) DONE
Many salts: 1520 c/s real, 190 c/s virtual
Only one salt: 1520 c/s real, 189 c/s virtual ===== Usage Example ===== Below an example of real cracking on mid-range NVIDIA hardware. <code> $ ../run/john ../../pass.dat –incremental=All15 -fo:sha512crypt-opencl Device 0: GeForce GTX 570 Local worksize (LWS) 512, global worksize (GWS) 15360 Loaded 4 password hashes with 4 different salts (sha512crypt [OpenCL]) odh (us_teste_10) s98 (us_teste_11) o9 (us_teste_08) s% (us_teste_09) guesses: 4 time: 0:00:01:21 DONE (Thu Feb 7 03:41:31 2013) c/s: 13251 trying: dl48d - iec7 Use the ”–show” option to display all of the cracked passwords reliably </code> ===== External Links ===== [1] AMD Accelerated Parallel Processing Programming Guide, visited on 04/19/2012

john/OpenCL-SHA-512.txt · Last modified: 2013/02/07 00:42 by claudio.andre
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Noncommercial-Share Alike 3.0 Unported
Recent changes RSS feed Donate to DokuWiki Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki Powered by OpenVZ Powered by Openwall GNU/*/Linux