Parallelization

One of the most common questions in the computing domain is “Can I use multiple processes or cores to increase speed?” In particular, problems that are time-sensitive or would take a relatively unreasonable amount of time to solve (NP-complete) receive a lot of attention; password cracking is no different.

Understanding the workload

John the Ripper has one primary workload: generating hashes of candidate passwords. Its two secondary loads (candidate password generation and comparison of computed hashes against those being cracked) are not always insignificant, but are not nearly as computationally intensive or complex as the hash calculations themselves. Most efforts to speed processing by parallelization focus on evenly dispersing the primary computational load over multiple cores, but there are simpler approaches for smaller problem sets (discussed below).

The three primary cracking modes of operation (single, wordlist, and incremental) differ mostly in the candidate passwords generated: the single and wordlist modes simply attempt presumably-higher-probability password candidates before they would normally be attempted in incremental mode. Hence, a normal JtR session executes the single and wordlist modes before falling through to incremental in an attempt to save time.

For example: as of this writing, an unmodified copy of john-1.7.2 compiled as linux-x86-mmx and running against a single FreeBSD MD5 hash on a 1 GHz Pentium III typically completes –single mode in under one second and ”–wordlist=password.lst –rules” mode in under one minute. This performance is dependent on a number of factors including:

  • The number of relevant words (such as the user's full name) available in the password file
  • The number of different salts loaded for cracking
  • Whether any additions or modifications have been made to the default wordlist or mangling rules

Even so, it serves to illustrate that unless the crack attempt is to be interrupted shortly, the majority of time spent cracking will be in incremental mode.

Simple approaches

See also: List.External:DumbForce, List.External:KnownForce

Instead of making code changes to JtR to attempt to evenly split the workload, some approaches simply use multiple instances of JtR configured to work on separate keyspaces. Although possibly not as efficient an approach, for certain use cases it is sufficient. Although these can be run on separate machines, in order to take advantage of multiple CPUs or CPU cores on a single machine it is necessary to run more than one instance of JtR per machine (to match the number of CPU cores that are to be used for the task). This generally does not require a separate directory or configuration file; multiple instances of JtR can be run from within the same directory, sharing the same john.conf, john.pot, and other files just fine - this is a feature.

Incremental:AllN

PatchnoCompilenoConfigurationyesEven splitnoOverheadno

One approach to splitting the keyspace by password length would be to create additional incremental modes in john.conf. This approach is limited to the maximum password length set at compile time, but is not difficult to set up or run. The following is an example based on Incremental:All:

[Incremental:All5]
File = $JOHN/all.chr
MinLen = 0
MaxLen = 5
CharCount = 95
 
[Incremental:All6]
File = $JOHN/all.chr
MinLen = 6
MaxLen = 6
CharCount = 95
 
[Incremental:All7]
File = $JOHN/all.chr
MinLen = 7
MaxLen = 7
CharCount = 95
 
[Incremental:All8]
File = $JOHN/all.chr
MinLen = 8
MaxLen = 8
CharCount = 95

To use, one would launch four separate instances of JtR, each with its own –incremental=AllN argument (replacing N with the maximum length). Although those instances with larger keyspaces will, in theory, take longer to complete, all of the instances will be accelerated due to not having to search other lengths. In practice, with slow enough hashes and a large enough number of different salts, none of the instances or only those for very short lengths will terminate in a reasonable amount of time, so this approach achieves a reasonable split of the workload.

It is advisable to split the lengths range by complexity to the extent possible (like it is done in the example above) rather than numerically (e.g., 0-2, 3-4, 5-6, 7-8 would be very inefficient).

External:Parallel

PatchnoCompilenoConfigurationyesEven splityesOverheadyes

The most obvious approach is to use the pre-defined Parallel –external mode. This has been shown to be sufficiently efficient for not-too-fast hashes and smaller numbers of nodes. To use this, you must modify the List.External:Parallel section to reflect how many nodes (total) are running and which instance (node) that particular configuration file represents. When running multiple instances on the same machine and in the same directory, just make multiple copies of the List.External:Parallel section, changing its name - then refer to these different instances of the sections (with different node numbers in them) on the command-line.

Please note that the Parallel external mode is meant to work along with wordlist and incremental modes, but not with single crack. If used with the latter, which is a wrong thing to do, it might completely miss some {candidate password, target hash} combinations that would otherwise be tested.

Markov mode

PatchyesCompileyesConfigurationnoEven splitnoOverheadno

Although technically not a parallelization attempt, a patch was posted in Fall of 2007 that enables JtR to check passwords of a given Markov score. It is feasible to run multiple instances of this patched version against separate Markov ranges using non-overlapping start and end parameters for each instance.

Extended efforts

Many programmatic attempts have been made to split John's various processing modes across multiple computing resources, but none have been accepted as an official implementation by the Openwall team. Varying reasons have been given, but the overall indication has been that said patches do not split the workload as elegantly or efficiently as the development team would like, they introduce unneeded dependencies on external libraries, and their code quality is often inadequate. These are all available on the Openwall FTP site.

Most projects have focused on using existing parallel programming toolkits to communicate between the instances/nodes; the following is a table of those efforts:

Table of existing works

Name Toolkit Release Date Author JtR Version Notes
cpushare CPUShare 2007/10/01 Andrea Arcangeli (cpushare.com) 1.7.2 Requires a CPUShare account
dnetj None 2008/03/03 John Anderson (bindshell.net) 1.7.2 Allows a less homogeneous environment than MPI
None MPI* 2001/05/20 stealth 1.6 Incremental-only
None MPI* 2004/01/22 Ryan Lim 1.6.36 Academic, incremental-only (Academic)
CS240A MPI* 2004/06/07 Pippin, Hall, Chen 1.6 Academic, paper notes focus on dictionary attack v. incremental
None MPI* 2006/06/01 Timothy Shelling 1.7.2 abandoned
John-MPI MPI* 2008/04/19 John Anderson (bindshell.net) 1.7.2 Continuation of Ryan Lim's patchset
John-MPI MPI* 2008/06/15 RB 1.7.2 Stripped bindshell.net implementation

This list is sorted alphabetically by toolkit name or patch/project name, and by date for the same name (MPI).

* PLEASE NOTE: MPI is a full parallelization suite; as such it has extensive configuration options and daemons to run before using. If using one of the MPI-based works, please familiarize yourself with the MPI version you install - seldom can one simply install them and immediately begin using 'mpirun'.

Back to John the Ripper user community resources.

 
john/parallelization.txt · Last modified: 2008/08/17 14:05 by RB
 
Recent changes RSS feed Creative Commons License Donate to DokuWiki Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki Powered by OpenVZ Powered by Openwall GNU/*/Linux