Markov generator

Markov generator

The assumption behind this theory is that people can remember their passwords because there is a hidden Markov model in the way they are generated. This paper was the source of inspiration for this patch. The basic hypothesis is that the probability that a letter appears at position n is a function of the letter at position (n-1) ¹⁾. This models the fact that it is more likely to find after a c the letter h than the letter z.

If P(x) is the probability that x is the first letter of a password, and P(a|b) the probability that b follows a, then for the word bull P(bull) = P(b)*P(u|b)*P(l|u)*P(l|l).

When converting to integers by defining P'(x) = round(-10*log( P(x) )), this becomes P'(bull) = P(b)+P'(u|b)+P'(l|u)+P'(l|l).

Properties

This convertion allows for faster computations and several properties, explained in the aforementionned paper, such as:

it is possible to efficiently count the quantity of passwords where P(password) is less than a given value
it is possible to efficiently generate the Nth password where P(password) is less than a given value
it is easy to compute the score P' of any word

Patch usage

The patch is already included in this tarball, as well as in the jumbo patch (1.7.3.1-all-4 and above). There is no standalone patch because of the general lack of interest in it. It adds a new option to john and adds three programs.

calc_stat

This program is used to generate your statistic file. It is recommended to do this to tune the generator to your specific language by running it on a large sample of clear text passwords. Its usage is:

./calc_stat [-p] dictionnary_file statfile

mkvcalcproba

This program takes a file with a password per line and a stats file as its input, and outputs statistics about them as a tab separated output. The columns are:

the password
the details of the P' calculation
the value of P' (markov score)
the password length
the position of the password if it was brute forced blindly
the markov score of the password without the first two characters

This program is of limited interest to most, but will let you

compute the strength of your passwords
generate nice graphs to include in reports (will be added soon)

genmkvpwd

This program could be used to output passwords on stdout, but is mainly useful for planning a cracking session. Its syntax is:

./genmkvpwd statfile max_lvl [max_len] [start] [end]

With a given statfile, it is mainly used in the following way : ./genmkvpwd statfile 0 len gives, for level between 100 and 350, the quantity of passwords whose maximum length in len and markov strength is or equal to level. For example:

$ ./genmkvpwd stats 0 12
lvl=100 (2424 Kb for nbparts) 9846 possible passwords
lvl=101 (2448 Kb for nbparts) 10 K possible passwords (10849)
lvl=102 (2472 Kb for nbparts) 11 K possible passwords (11915)
lvl=103 (2496 Kb for nbparts) 13 K possible passwords (13104)
lvl=104 (2520 Kb for nbparts) 14 K possible passwords (14446)
...
lvl=346 (8328 Kb for nbparts) 440023 G possible passwords (440023594138549)
lvl=347 (8352 Kb for nbparts) 478482 G possible passwords (478482142956335)
lvl=348 (8376 Kb for nbparts) 520144 G possible passwords (520144425218940)
lvl=349 (8400 Kb for nbparts) 565264 G possible passwords (565264299077832)
lvl=350 (8424 Kb for nbparts) 614114 G possible passwords (614114588006381)

john mode

The syntax of the new generator is:

–markov[=level[:start:end[:maxlen]]]

Here level correspond to the maximum markov strength of candidate passwords, start to the first password to be generated (default: 0), end to the last (default: 0, 0 meaning “last password”), and maxlen to the maximum length of candidate passwords.

Real world usage example

In this example, two computers are available to the cracker:

a laptop, with a 1.2Ghz Core 2 CPU, with 2 cores
a server, with a 1.86Ghz Xeon CPU, with 4 cores

Password of length less than 12 will be tested against a single Raw-MD5 hash during 24 hours.

Benchmarking against the target

It is important to be able to benchmark against the passwords that are going to be tested:

$ time ./john -markov:200:10000000:50000000:12 -format:raw-md5 truc.md5
Loaded 1 password hash (Raw MD5 [raw-md5 SSE2])
MKV start (lvl=200 len=12 pwd=40000000)
guesses: 0  time: 0:00:00:11 100%  c/s: 3442K  trying: sinounol - sinounot

real    0m11.646s
user    0m11.581s
sys     0m0.008s

The time command is more precise than john's output. The actual speed is 40000000/11.646 = 3434655 passwords/s. Note that while this matches the c/s figure in this example, this is not always the case.

Getting the total quantity of passwords to generate

The benchmarks give:

3434655 passwords/s for the laptop
9912442 passwords/s for the server (not an actual benchmark, made up for this example)

There are two cores on the laptop, and four on the server, leading to a total speed of (3434655*2 + 9912442*4) = 46519078 passwords/s. During 24 hours, this is 46519078*3600*24 = 4019248339200 passwords, or 4019G passwords. The genmkvpwd for length 12 passwords gives the following values:

lvl=294 (7080 Kb for nbparts) 3634 G possible passwords (3634988600673)
lvl=295 (7104 Kb for nbparts) 4014 G possible passwords (4014261346547)
lvl=296 (7128 Kb for nbparts) 4432 G possible passwords (4432325494466)

The quantity of passwords for markov level 295 matches the quantity of passwords that could be tested in 24 hours.

Generating the workloads

4014261346547 passwords are to be divided between 6 CPUs, capable of generating a total of 46519078 passwords/s.

host	CPU	speed	passwords to generate	start
laptop	1	3434655	296385986094	0
laptop	2	3434655	296385986094	296385986094
server	1	9912442	855372343589	592771972188
server	2	9912442	855372343589	1448144315777
server	3	9912442	855372343589	2303516659366
server	4	9912442	the rest	3158889002955

The following command lines are to be executed:

on the laptop

$ ./john -markov:295:0:296385986093:12 -format:raw-md5 truc.md5 &
$ ./john -markov:295:296385986094:592771972187:12 -format:raw-md5 truc.md5

on the server

$ ./john -markov:295:592771972188:1448144315776:12 -format:raw-md5 truc.md5 &
$ ./john -markov:295:1448144315777:2303516659365:12 -format:raw-md5 truc.md5 &
$ ./john -markov:295:2303516659366:3158889002954:12 -format:raw-md5 truc.md5 &
$ ./john -markov:295:3158889002955:0:12 -format:raw-md5 truc.md5

It is recommended to automate all these steps.

Markov vs. "incremental" mode

Matt Weir ran some curious tests of the two cracking modes. In his tests, the Markov mode with carefully chosen limit outperformed “incremental” in the end. On the other hand, “incremental” could continue running (or the session could be interrupted and continued at a later time), whereas the Markov mode terminated upon reaching the limit.

Please see Matt's first blog post on the topic and his follow-up blog post.

¹⁾ “incremental” mode as supported by the “official” JtR also uses this hypothesis

Table of Contents