The assumption behind this theory is that people can remember their passwords because there is a hidden Markov model in the way they are generated. This paper was the source of inspiration for this patch. The basic hypothesis is that the probability that a letter appears at position n is a function of the letter at position (n-1) 1). This models the fact that it is more likely to find after a c the letter h than the letter z.
If P(x) is the probability that x is the first letter of a password, and P(a|b) the probability that b follows a, then for the word bull
P(bull) = P(b)*P(u|b)*P(l|u)*P(l|l).
When converting to integers by defining P'(x) = round(-10*log( P(x) )), this becomes P'(bull) = P(b)+P'(u|b)+P'(l|u)+P'(l|l).
This convertion allows for faster computations and several properties, explained in the aforementionned paper, such as:
The patch is already included in this tarball, as well as in the jumbo patch (1.7.3.1-all-4 and above). There is no standalone patch because of the general lack of interest in it. It adds a new option to john
and adds three programs.
This program is used to generate your statistic file. It is recommended to do this to tune the generator to your specific language by running it on a large sample of clear text passwords. Its usage is:
./calc_stat [-p] dictionnary_file statfile
This program takes a file with a password per line and a stats file as its input, and outputs statistics about them as a tab separated output. The columns are:
This program is of limited interest to most, but will let you
This program could be used to output passwords on stdout, but is mainly useful for planning a cracking session. Its syntax is:
./genmkvpwd statfile max_lvl [max_len] [start] [end]
With a given statfile, it is mainly used in the following way : ./genmkvpwd statfile 0 len
gives, for level
between 100 and 350, the quantity of passwords whose maximum length in len
and markov strength is or equal to level
. For example:
$ ./genmkvpwd stats 0 12 lvl=100 (2424 Kb for nbparts) 9846 possible passwords lvl=101 (2448 Kb for nbparts) 10 K possible passwords (10849) lvl=102 (2472 Kb for nbparts) 11 K possible passwords (11915) lvl=103 (2496 Kb for nbparts) 13 K possible passwords (13104) lvl=104 (2520 Kb for nbparts) 14 K possible passwords (14446) ... lvl=346 (8328 Kb for nbparts) 440023 G possible passwords (440023594138549) lvl=347 (8352 Kb for nbparts) 478482 G possible passwords (478482142956335) lvl=348 (8376 Kb for nbparts) 520144 G possible passwords (520144425218940) lvl=349 (8400 Kb for nbparts) 565264 G possible passwords (565264299077832) lvl=350 (8424 Kb for nbparts) 614114 G possible passwords (614114588006381)
The syntax of the new generator is:
–markov[=level[:start:end[:maxlen]]]
Here level
correspond to the maximum markov strength of candidate passwords, start
to the first password to be generated (default: 0), end
to the last (default: 0, 0 meaning “last password”), and maxlen
to the maximum length of candidate passwords.
In this example, two computers are available to the cracker:
Password of length less than 12 will be tested against a single Raw-MD5 hash during 24 hours.
It is important to be able to benchmark against the passwords that are going to be tested:
$ time ./john -markov:200:10000000:50000000:12 -format:raw-md5 truc.md5 Loaded 1 password hash (Raw MD5 [raw-md5 SSE2]) MKV start (lvl=200 len=12 pwd=40000000) guesses: 0 time: 0:00:00:11 100% c/s: 3442K trying: sinounol - sinounot real 0m11.646s user 0m11.581s sys 0m0.008s
The time
command is more precise than john
's output. The actual speed is 40000000/11.646 = 3434655 passwords/s. Note that while this matches the c/s figure in this example, this is not always the case.
The benchmarks give:
There are two cores on the laptop, and four on the server, leading to a total speed of (3434655*2 + 9912442*4) = 46519078 passwords/s. During 24 hours, this is 46519078*3600*24 = 4019248339200 passwords, or 4019G passwords. The genmkvpwd
for length 12 passwords gives the following values:
lvl=294 (7080 Kb for nbparts) 3634 G possible passwords (3634988600673) lvl=295 (7104 Kb for nbparts) 4014 G possible passwords (4014261346547) lvl=296 (7128 Kb for nbparts) 4432 G possible passwords (4432325494466)
The quantity of passwords for markov level 295 matches the quantity of passwords that could be tested in 24 hours.
4014261346547 passwords are to be divided between 6 CPUs, capable of generating a total of 46519078 passwords/s.
host | CPU | speed | passwords to generate | start |
---|---|---|---|---|
laptop | 1 | 3434655 | 296385986094 | 0 |
laptop | 2 | 3434655 | 296385986094 | 296385986094 |
server | 1 | 9912442 | 855372343589 | 592771972188 |
server | 2 | 9912442 | 855372343589 | 1448144315777 |
server | 3 | 9912442 | 855372343589 | 2303516659366 |
server | 4 | 9912442 | the rest | 3158889002955 |
The following command lines are to be executed:
$ ./john -markov:295:0:296385986093:12 -format:raw-md5 truc.md5 & $ ./john -markov:295:296385986094:592771972187:12 -format:raw-md5 truc.md5
$ ./john -markov:295:592771972188:1448144315776:12 -format:raw-md5 truc.md5 & $ ./john -markov:295:1448144315777:2303516659365:12 -format:raw-md5 truc.md5 & $ ./john -markov:295:2303516659366:3158889002954:12 -format:raw-md5 truc.md5 & $ ./john -markov:295:3158889002955:0:12 -format:raw-md5 truc.md5
It is recommended to automate all these steps.
Matt Weir ran some curious tests of the two cracking modes. In his tests, the Markov mode with carefully chosen limit outperformed “incremental” in the end. On the other hand, “incremental” could continue running (or the session could be interrupted and continued at a later time), whereas the Markov mode terminated upon reaching the limit.
Please see Matt's first blog post on the topic and his follow-up blog post.