The assumption behind this theory is that people can remember their passwords because there is a hidden Markov model in the way they are generated. This paper was the source of inspiration for this patch. The basic hypothesis is that *the probability that a letter appears at position n is a function of the letter at position (n-1)* ^{1)}. This models the fact that it is more likely to find after a **c** the letter **h** than the letter **z**.

If P(x) is the probability that **x** is the first letter of a password, and P(a|b) the probability that **b** follows **a**, then for the word `bull`

P(bull) = P(b)*P(u|b)*P(l|u)*P(l|l).

When converting to integers by defining P'(x) = round(-10*log( P(x) )), this becomes P'(bull) = P(b)+P'(u|b)+P'(l|u)+P'(l|l).

This convertion allows for faster computations and several properties, explained in the aforementionned paper, such as:

- it is possible to efficiently count the quantity of passwords where P(password) is less than a given value
- it is possible to efficiently generate the Nth password where P(password) is less than a given value
- it is easy to compute the score P' of any word

The patch is already included in this tarball, as well as in the jumbo patch (1.7.3.1-all-4 and above). There is no standalone patch because of the general lack of interest in it. It adds a new option to `john`

and adds three programs.

This program is used to generate your statistic file. It is recommended to do this to tune the generator to your specific language by running it on a large sample of clear text passwords. Its usage is:

`./calc_stat [-p] dictionnary_file statfile`

This program takes a file with a password per line and a stats file as its input, and outputs statistics about them as a tab separated output. The columns are:

- the password
- the details of the P' calculation
- the value of P' (markov score)
- the password length
- the position of the password if it was brute forced blindly
- the markov score of the password without the first two characters

This program is of limited interest to most, but will let you

- compute the strength of your passwords
- generate nice graphs to include in reports (will be added soon)

This program could be used to output passwords on stdout, but is mainly useful for planning a cracking session. Its syntax is:

`./genmkvpwd statfile max_lvl [max_len] [start] [end]`

With a given statfile, it is mainly used in the following way : `./genmkvpwd statfile 0 len`

gives, for `level`

between 100 and 350, the quantity of passwords whose maximum length in `len`

and markov strength is or equal to `level`

. For example:

$ ./genmkvpwd stats 0 12 lvl=100 (2424 Kb for nbparts) 9846 possible passwords lvl=101 (2448 Kb for nbparts) 10 K possible passwords (10849) lvl=102 (2472 Kb for nbparts) 11 K possible passwords (11915) lvl=103 (2496 Kb for nbparts) 13 K possible passwords (13104) lvl=104 (2520 Kb for nbparts) 14 K possible passwords (14446) ... lvl=346 (8328 Kb for nbparts) 440023 G possible passwords (440023594138549) lvl=347 (8352 Kb for nbparts) 478482 G possible passwords (478482142956335) lvl=348 (8376 Kb for nbparts) 520144 G possible passwords (520144425218940) lvl=349 (8400 Kb for nbparts) 565264 G possible passwords (565264299077832) lvl=350 (8424 Kb for nbparts) 614114 G possible passwords (614114588006381)

The syntax of the new generator is:

`–markov[=level[:start:end[:maxlen]]]`

Here `level`

correspond to the maximum markov strength of candidate passwords, `start`

to the first password to be generated (default: 0), `end`

to the last (default: 0, 0 meaning “last password”), and `maxlen`

to the maximum length of candidate passwords.

In this example, two computers are available to the cracker:

- a laptop, with a 1.2Ghz Core 2 CPU, with 2 cores
- a server, with a 1.86Ghz Xeon CPU, with 4 cores

Password of length less than 12 will be tested against a single Raw-MD5 hash during 24 hours.

It is important to be able to benchmark against the passwords that are going to be tested:

$ time ./john -markov:200:10000000:50000000:12 -format:raw-md5 truc.md5 Loaded 1 password hash (Raw MD5 [raw-md5 SSE2]) MKV start (lvl=200 len=12 pwd=40000000) guesses: 0 time: 0:00:00:11 100% c/s: 3442K trying: sinounol - sinounot real 0m11.646s user 0m11.581s sys 0m0.008s

The `time`

command is more precise than `john`

's output. The actual speed is 40000000/11.646 = 3434655 passwords/s. Note that while this matches the c/s figure in this example, this is not always the case.

The benchmarks give:

- 3434655 passwords/s for the laptop
- 9912442 passwords/s for the server (not an actual benchmark, made up for this example)

There are two cores on the laptop, and four on the server, leading to a total speed of (3434655*2 + 9912442*4) = 46519078 passwords/s. During 24 hours, this is 46519078*3600*24 = 4019248339200 passwords, or 4019G passwords. The `genmkvpwd`

for length 12 passwords gives the following values:

lvl=294 (7080 Kb for nbparts) 3634 G possible passwords (3634988600673) lvl=295 (7104 Kb for nbparts) 4014 G possible passwords (4014261346547) lvl=296 (7128 Kb for nbparts) 4432 G possible passwords (4432325494466)

The quantity of passwords for markov level 295 matches the quantity of passwords that could be tested in 24 hours.

4014261346547 passwords are to be divided between 6 CPUs, capable of generating a total of 46519078 passwords/s.

host | CPU | speed | passwords to generate | start |
---|---|---|---|---|

laptop | 1 | 3434655 | 296385986094 | 0 |

laptop | 2 | 3434655 | 296385986094 | 296385986094 |

server | 1 | 9912442 | 855372343589 | 592771972188 |

server | 2 | 9912442 | 855372343589 | 1448144315777 |

server | 3 | 9912442 | 855372343589 | 2303516659366 |

server | 4 | 9912442 | the rest | 3158889002955 |

The following command lines are to be executed:

- on the laptop

$ ./john -markov:295:0:296385986093:12 -format:raw-md5 truc.md5 & $ ./john -markov:295:296385986094:592771972187:12 -format:raw-md5 truc.md5

- on the server

$ ./john -markov:295:592771972188:1448144315776:12 -format:raw-md5 truc.md5 & $ ./john -markov:295:1448144315777:2303516659365:12 -format:raw-md5 truc.md5 & $ ./john -markov:295:2303516659366:3158889002954:12 -format:raw-md5 truc.md5 & $ ./john -markov:295:3158889002955:0:12 -format:raw-md5 truc.md5

It is recommended to automate all these steps.

Matt Weir ran some curious tests of the two cracking modes. In his tests, the Markov mode with carefully chosen limit outperformed “incremental” in the end. On the other hand, “incremental” could continue running (or the session could be interrupted and continued at a later time), whereas the Markov mode terminated upon reaching the limit.

Please see Matt's first blog post on the topic and his follow-up blog post.