Parallel and distributed processing with John the Ripper

One of the most common questions in the computing domain is “Can I use multiple processes or cores to increase speed?” In particular, problems that are time-sensitive or would take a relatively unreasonable amount of time to solve (NP-complete) receive a lot of attention; password cracking is no different.

Understanding the workload

John the Ripper has one primary workload: generating hashes of candidate passwords. Its two secondary loads (candidate password generation and comparison of computed hashes against those being cracked) are not always insignificant, but are not nearly as computationally intensive or complex as the hash calculations themselves. Most efforts to speed processing by parallelization focus on evenly dispersing the primary computational load over multiple cores, but there are simpler approaches for smaller problem sets (discussed below).

The three primary cracking modes of operation (single, wordlist, and incremental) differ mostly in the candidate passwords generated: the single and wordlist modes simply attempt presumably-higher-probability password candidates before they would normally be attempted in incremental mode. Hence, a normal JtR session executes the single and wordlist modes before falling through to incremental in an attempt to save time.

For example: as of this writing, an unmodified copy of john-1.7.2 compiled as linux-x86-mmx and running against a single FreeBSD MD5 hash on a 1\ GHz Pentium\ III typically completes –single mode in under one second and ”–wordlist=password.lst –rules” mode in under one minute. This performance is dependent on a number of factors including:

The number of relevant words (such as the user's full name) available in the password file
The number of different salts loaded for cracking
Whether any additions or modifications have been made to the default wordlist or mangling rules

Even so, it serves to illustrate that unless the crack attempt is to be interrupted shortly, the majority of time spent cracking will be in incremental mode.

Built-in parallelization with OpenMP

John the Ripper 1.7.6+ includes built-in parallelization for multi-CPU and/or multi-core systems by means of OpenMP directives. In 1.7.6, this was limited to bcrypt hashes (with JtR's own optimized code) and SHA-crypt and SunMD5 hashes on recent Linux and Solaris systems (with system-provided thread-safe crypto code). All of this requires GCC 4.2 or newer, or another OpenMP-capable C compiler (also tested with Sun Studio 12). To enable the OpenMP parallelization, the proper OMPFLAGS line in the Makefile needs to be uncommented.

John the Ripper 1.7.9+ adds built-in OpenMP parallelization for traditional DES-based crypt(3), BSDI-style DES-based crypt(3), FreeBSD-style MD5-based crypt(3), and for LM hashes. (For 1.7.6 through 1.7.8, similar parallelization of the DES-based hashes was available with patches.)

In John 1.7.9-jumbo-6, the following formats support OpenMP (after enabling it in Makefile as described above): BF, BSDI, CRC32, crypt, DES, Django, DragonFly3-32, DragonFly3-64, DragonFly4-32, DragonFly4-64, Drupal7, EPiServer, GOST, HDAA, IPB2, KeePass, keychain, LM, Lotus5, MD5, Mozilla, MSCash, MSCash2, MSCHAPv2, MSKrb5, MySQL (old), NETHALFLM, NETLM, NETLMv2, NETNTLM, NETNTLMv2, ODF, Office, PKZIP, pwsafe, RACF, RAR, raw-SHA224, raw-SHA256, raw-SHA384, raw-SHA512, SAPB, SAPG, sha256crypt, sha512crypt, SIP, SSH, SybaseASE, trip, VNC, WBB3, WPAPSK, XSHA, XSHA512, ZIP.

Built-in MPI parallelization support

Starting with 1.7.7-jumbo-5, John also supports MPI. No additional patch is required, just uncomment the last two lines in the MPI section in Makefile so it looks like this:

## Uncomment the TWO lines below for MPI (can be used together with OMP as well)
## If you experience problems with MPI_Barrier, remove -DJOHN_MPI_BARRIER
## If you experience problems with MPI_Abort, remove -DJOHN_MPI_ABORT
CC = mpicc -DHAVE_MPI -DJOHN_MPI_BARRIER -DJOHN_MPI_ABORT
MPIOBJ = john-mpi.o

More information about MPI is in doc/README.mpi.

Misuse(?) of OpenCL

John the Ripper 1.7.9-jumbo-6 and newer includes OpenCL code for some formats. This is primarily intended for use on GPUs, but it may also be used on CPUs, and typically (on both Intel's and AMD's OpenCL SDK) it will make use of all CPUs in a system. With few exceptions, the OpenCL code was not optimized for use on CPUs, though, and it may be inconvenient to use because of too large numbers of candidate passwords being buffered and tested per OpenCL kernel invocation, hence making the program less interactive. Yet this is an option, especially for formats where we do not have OpenMP parallelization, but do have OpenCL code. (When OpenMP is also available, it is typically more efficient, easier to use, and does not affect program interactivity as much.)

Simple approaches

See also: List.External:DumbForce, List.External:KnownForce

Instead of making code changes to JtR to attempt to evenly split the workload, some approaches simply use multiple instances of JtR configured to work on separate keyspaces. Although possibly not as efficient an approach, for certain use cases it is sufficient. In order to take advantage of multiple CPUs or CPU cores on a single machine it is necessary to run more than one instance of JtR per machine (to match the number of CPU cores that are to be used for the task). This generally does not require a separate directory or configuration file; multiple instances of JtR can be run from within the same directory, sharing the same john.conf, john.pot, and other files just fine - this is a feature.

Incremental:AllN

Patch

no

Compile

no

Configuration

yes

Even split

no

Overhead

no

One approach to splitting the keyspace by password length would be to create additional incremental modes in john.conf. This approach is limited to the maximum password length set at compile time, but is not difficult to set up or run. The following is an example based on Incremental:All:

[Incremental:All5]
File = $JOHN/all.chr
MinLen = 0
MaxLen = 5
CharCount = 95
 
[Incremental:All6]
File = $JOHN/all.chr
MinLen = 6
MaxLen = 6
CharCount = 95
 
[Incremental:All7]
File = $JOHN/all.chr
MinLen = 7
MaxLen = 7
CharCount = 95
 
[Incremental:All8]
File = $JOHN/all.chr
MinLen = 8
MaxLen = 8
CharCount = 95

To use, one would launch four separate instances of JtR, each with its own –incremental=AllN argument (replacing N with the maximum length). Although those instances with larger keyspaces will, in theory, take longer to complete, all of the instances will be accelerated due to not having to search other lengths. In practice, with slow enough hashes and a large enough number of different salts, none of the instances or only those for very short lengths will terminate in a reasonable amount of time, so this approach achieves a reasonable split of the workload.

It is advisable to split the lengths range by complexity to the extent possible (like it is done in the example above) rather than numerically (e.g., 0-2, 3-4, 5-6, 7-8 would be very inefficient).

External:Parallel

Patch

no

Compile

no

Configuration

yes

Even split

yes

Overhead

yes

The most obvious approach is to use the pre-defined Parallel –external mode. This has been shown to be sufficiently efficient for not-too-fast hashes and smaller numbers of nodes. To use this, you must modify the List.External:Parallel section to reflect how many nodes (total) are running and which instance (node) that particular configuration file represents. When running multiple instances on the same machine and in the same directory, just make multiple copies of the List.External:Parallel section, changing its name - then refer to these different instances of the sections (with different node numbers in them) on the command-line.

Please note that the Parallel external mode is meant to work along with wordlist and incremental modes, but will not in the single crack mode. If used in single crack mode, it will skip candidate password and target hash combinations that would otherwise be tested.

Markov mode

Patch

yes

Compile

yes

Configuration

no

Even split

no

Overhead

no

Although technically not a parallelization attempt, a patch was posted in Fall of 2007 that enables JtR to check passwords of a given Markov score. The patch has since been integrated into the jumbo patch. It is feasible to run multiple instances of this patched version against separate Markov ranges using non-overlapping start and end parameters for each instance. If MPI is enabled, this splitting is performed automatically.

Details are available at the Markov generator page.

Extended efforts

Many programmatic attempts have been made to split John's various processing modes across multiple computing resources, but none have been accepted as an official implementation by the Openwall team. Varying reasons have been given, but the overall indication has been that said patches do not split the workload as elegantly or efficiently as the development team would like, they introduce unneeded dependencies on external libraries, and their code quality is often inadequate. These are all available on the Openwall FTP site. The end of this post describes a more ideal method of programmatically splitting the workload, which:

Splits the incremental passes into as many ranges as there are slave nodes
When a slave completes its range, it is given half of another node's uncompleted range
If a slave goes offline, its entire range is reassigned to another idle slave
The master only advances to the next pass if all ranges for the current one have been completed

Most projects have focused on using existing parallel programming toolkits to communicate between the instances/nodes; the following is a table of those efforts:

Table of existing works

Name	Toolkit	Release Date	Author	JtR Version	Notes
None	MPI*	2001/05/20	stealth	1.6	Incremental-only
DJohn	None	2001/12/25	Luis Parravicini	1.6.30	Incremental-only, abandoned download
None	MPI*	2004/01/22	Ryan Lim	1.6.36	Academic, incremental-only (Academic)
CS240A	MPI*	2004/06/07	Pippin, Hall, Chen	1.6	Academic, paper notes focus on dictionary attack v. incremental
None	MPI*	2006/06/01	Timothy Shelling	1.7.2	abandoned download
None	None (script)	2006/06/05	Timothy Shelling	1.7.2	abandoned, download checksum addon
cpushare	CPUShare	2007/10/01	Andrea Arcangeli (cpushare.com)	1.7.2	Requires a CPUShare account (no longer available)
dnetj	None	2008/03/03	John Anderson (bindshell.net)	1.7.2	Allows a less homogeneous environment than MPI
John-MPI	MPI*	2008/04/19	John Anderson (bindshell.net)	1.7.2	Continuation of Ryan Lim's patchset
GI John	None	2009/02/23	Balázs Bucsay Rycon.hu	1.7.3.1	Grid Implemented John The Ripper (http://www.gijohn.info)
John-MPI	MPI*	2009/06/18	RB	1.7.3.1	Stripped bindshell.net implementation
FullMPI	MPI*	2010/06/22	magnum	1.7.6	Extended version of the above, supporting all cracking modes (included in 1.7.7 Jumbo-5)
-omp patch	OpenMP	2010/05/08	Solar Designer	1.7.5	Integrated into JtR 1.7.6+ (so no longer needed as a patch). Currently only parallelizes bcrypt hashes. Requires recent gcc or another OpenMP-capable C compiler (also tested with Sun Studio 12).
-omp-des patch	OpenMP	2010/06/27	Solar Designer	1.7.6	Parallelizes JtR's bitslice DES implementation, reasonably efficient for DES-based Unix crypt(3) hashes. Requires OpenMP-capable C compiler.
Clortho	Clortho	2012/06/24	ccdes	1.7.6 / any	Pre-divided distributed keyspace, designed for exhaustion of keyspace given enough compute

The list is mostly sorted by release date.

Downloadable files for the above and some other JtR parallelization projects are found in the Openwall file archive (with some of them in a directory specific to MPI).

* PLEASE NOTE: MPI is a full parallelization suite; as such it has extensive configuration options and daemons to run before using. If using one of the MPI-based works, please familiarize yourself with the MPI version you install - seldom can one simply install them and immediately begin using 'mpirun'.

Back to John the Ripper user community resources.

/* TODO: - Clean up 'Extended Efforts' section (bring in-line w/'Simple approaches' format)

Add notes on using individual patches/approaches
Make sure dates are date posted, not last-posted (pretty sure I screwed at least one up)
Add minimal benchmarks

- Cover whole doc for CCC (cohesiveness, clarity, & comprehensibility) */

Table of Contents

Parallel and distributed processing with John the Ripper

Understanding the workload

Built-in parallelization with OpenMP

Built-in MPI parallelization support

Misuse(?) of OpenCL

Simple approaches

Incremental:AllN

External:Parallel

Markov mode

Extended efforts

Table of existing works