Some of these ideas were initially suggested for Google Summer of Code (GSoC), but they're also relevant to our own Summer of Security program. When applying to us for GSoC or otherwise, please use our application template (which also includes info on how to contact us).
Although we have a lot of ideas, our mentoring capacity is limited and many of the ideas would be incompatible if worked on during the same summer. Thus, in 2015 we intend to work on a subset of these ideas only (as well as on many other things).
The main headings below (such as “John the Ripper SIMD support enhancements”) serve as “student roles” for the purposes of our application template for GSoC. When applying outside of GSoC, such as for our Summer of Security, the sub-headings (such as “John the Ripper SIMD support enhancements: AVX2 sub-task”) may also be used as “student roles” (those sub-tasks are deemed too small for GSoC, but are OK for Summer of Security).
Difficulty: low to medium
It is very easy for new code to get into jumbo, or we wouldn't have received and integrated support for so many “formats” (hundreds). Unfortunately, this also means that many of the formats supported in jumbo are not robust when faced with invalid input.
The task here is to identify specific issues (how jumbo may misbehave on specific invalid inputs - e.g., dereference a NULL pointer or incur an integer overflow or simply load an impossible hash encoding for cracking instead of rejecting it right away) and to patch those. To find the issues, you'll be expected to use a combination of fuzzing (such as with afl and with simple custom John the Ripper specific fuzzers e.g. like these that you'll need to write) and source code reviews (both manual reviews and automated static analysis). And you'll be expected to patch all of the issues yourself, not merely communicate them to jumbo developers.
Unfortunately, due to sheer size of would-be attack surface, we don't expect this project to result in a confidence level sufficient for us to declare jumbo being ready to process untrusted input in the security sense, but we do expect the project, if successful, to result in a much more robust jumbo, where almost all of the formats will just happen to be robust in practice.
We'd also appreciate code quality improvements that don't currently affect runtime behavior of the program.
After completion of the initial project, we'll expect you to stay on the team as “jumbo robustness officer”.
So far, John the Ripper -jumbo uses SIMD intrinsics for up to Intel AVX and AMD XOP, and for bitslice DES also ARM NEON and PowerPC AltiVec. It also contains some x86 and x86-64 assembly code that uses SSE2 instructions, and x86(-64) CPU detection routines capable of checking for up to AVX and XOP support.
We need to add AVX2 and MIC/AVX-512 intrinsics, and then detection of the CPU's AVX2 and AVX-512 support as appropriate. We also need to optimize the code in other ways (interleaving and maybe bitslicing). We wouldn't mind the addition of ARM NEON, PowerPC AltiVec, and/or other non-x86 SIMD intrinsics to more JtR “formats” as well (beyond bitslice DES).
We need to add optional use of 256-bit AVX2 intrinsics to JtR -jumbo formats that now use up to AVX/XOP intrinsics, and we need to add AVX2 support detection.
AVX2 might or might not run faster on current Haswell CPUs, but we need it for future CPUs anyway. We already have 256-bit AVX (the “floating-point” one) for bitslice DES, but it's no faster than 128-bit on current CPUs. It's currently only enabled in 32-bit builds, as it might be partially compensating for having only 8 registers there (the performance differences between 128-bit and 256-bit AVX are very small, though). AVX2 might help for MD4/MD5/SHA-1/SHA-2. In php_mt_seed, AVX2 actually helps a lot (as tested on our i7-4770K).
We're able to provide remote access to a Haswell machine to test this on, if the student doesn't have one of their own.
Similarly, we need to add Xeon Phi's 512-bit MIC intrinsics to existing SIMD-enabled code in John the Ripper. This is not so much to achieve great speeds on Xeon Phi itself (the speeds are expected to be on par with GPUs, which are generally cheaper), but is mostly to prepare for AVX-512 in near-future CPUs, given that MIC and AVX-512 are largely compatible at source code intrinsics level (even though the resulting opcodes are different).
This task also involves getting John the Ripper -jumbo built for Xeon Phi. So far, we only built for Xeon Phi the core, non-jumbo John the Ripper tree, which is less demanding in terms of library dependencies, and we added MIC intrinsics for John the Ripper's bitslice DES code. (See this posting and another one on the limited achievements so far.)
We're able to provide remote access to a machine with Xeon Phi for this project.
We'd accept the addition of SIMD intrinsics for popular non-x86 architectures such as ARM and PowerPC to more JtR formats. Right now, we only have those for bitslice DES, but their addition for MD4/MD5/SHA-1/SHA-2 should be straightforward.
We're able to provide remote access to an ARM development board with a NEON-capable CPU. For Power architecture, GCC Compile Farm or a cloud hosting provider may be used. (Some other options are the older PowerPC-based Macs, Rosetta on x86-based Macs with OS X up to 10.6, and QEMU. Obviously, the options involving emulation are less suitable for code optimization.)
This sub-task is considered optional, but it's also a straightforward one.
We currently have straightforward SIMD implementations of SHA-256 and SHA-512 (computing, respectively, 4 or 2 of these hashes per 128-bit SIMD vector). However, these implementations might not fully hide modern and future CPUs' instruction latencies and/or fully exploit their instruction-level parallelism (beyond SIMD). Thus, interleaving of instructions (or C source code SIMD intrinsics) needs to be added and optimal interleave factors determined for current CPUs.
Additionally, although this is unlikely on current CPUs and GPUs, bitslice implementations might turn out to deliver better performance on some CPUs or/and GPUs (and even if not yet, which is likely, they may be valuable for relatively quickly re-testing this on future hardware). A possible source for speedup is the bit rotate operations, of which there are many in SHA-2, and which become no-ops with a bitslice implementation (on the other hand, many intermediary values will have to be in L1 cache or local memory rather than in registers, which may slow things down to a larger degree). We've previously demonstrated how it is possible to create a bitslice implementation of MD5 (original, revised, latest), and SHA-2 are similar in design.
One sub-task here is to try and optimize the straightforward SIMD implementations further. Another sub-task is to create bitslice implementations and to compare their performance against the straightforward implementations on actual hardware. And a third sub-task (the most important one in terms of its immediate practical benefit) is to ensure JtR's implementations of higher-level password hashing schemes, etc. that are based (in part) on SHA-256 and SHA-512 will use one of the optimized SIMD implementations. This applies to SHA-crypt (password hashes supported in glibc 2.7+ and in use by certain modern Linux distributions and by DragonFly BSD), Mac OS X 10.7+ salted SHA-512 password hashes, Drupal 7 password hashes, and more. As of this writing, some of the high-level schemes use the existing (not yet interleaved) SIMD implementations of SHA-256 and SHA-512, and some don't even use the existing SIMD implementations yet (use the older non-SIMD implementations or even calls into OpenSSL libcrypto). When this project is complete, all of them should use the most optimal SIMD implementations.
The bitslicing sub-task is beyond the “medium” difficulty we have stated for this GSoC project idea, but it is considered a stretch goal. We won't consider a GSoC project failed if the student does not succeed solely at the bitslicing sub-task while having completed everything else.
Difficulty: medium to high
We need to decide on (unofficial) crypt(3) encoding syntax for Password Hashing Competition (PHC) finalists (Argon, battcrypt, Catena, Lyra2, Makwa, Parallel, POMELO, Pufferfish, yescrypt) and optionally some non-finalists, and implement them in John the Ripper.
At first, this should be done in straightforward manner using the readily available implementations, preferably adding OpenMP parallelization around calls into the existing implementations (except for those determined not to be MT-safe, if any).
Then defense- and attack-optimized implementations should be produced. The former should compute individual hashes more efficiently, eliminating any redundancy there might have been in the previously available implementations (e.g., common subexpressions) and using CPUs' multi-issue and SIMD capabilities to the extent possible within the parallelism available during computation of individual hashes. The latter may additionally compute multiple hashes per thread in parallel in order to further increase CPUs' instruction issue rate (through interleaving of instructions corresponding to different hash computations) and/or SIMD vector width usage, beyond what was possible for individual hashes.
Similarly, OpenCL implementations should be created, also computing multiple hashes in parallel and thus intended for attack rather than defense, thereby letting the compiler make similar instruction-level optimizations. These may be benchmarked against the manually-optimized (such as with C and SIMD intrinsics) attack-optimized implementations mentioned above, and more importantly they should then be tuned for specific GPU architectures.
Since these password hashing schemes are (mostly) not in use yet, the intent of this project is not so much to enhance John the Ripper but is mostly to research the attack and defense speed ratios, as well as the speeds themselves, of these hashing schemes on currently available commodity hardware (CPUs and GPUs).
Besides targeting additional programming languages, we also welcome simpler/cleaner and/or better optimized implementations in C with and without intrinsics and assembly (for architectures of your choice, to be discussed with us in advance), but in this project we'll view this as an extra.
Some hash and cipher types don't fit GPUs very well.
We're interested in adding support for more hashes and ciphers on more FPGA boards, with one of the current targets being bcrypt on the quad Spartan 6 LX150 ZTEX 1.15y, which we're able to provide (at least) remote access to. (We may also provide the boards themselves to students in the US, in Moscow, Russia, and in Zagreb, Croatia.)
Similarly to the task above, we'd also like to implement DES-based crypt(3), also known as descrypt, on ZTEX 1.15y in a way that would integrate with John the Ripper. An implementation of descrypt on these boards is already available, but revising it to run against multiple hashes (likely with different salts) and with a smarter ordering of candidate passwords may be challenging.
With bcrypt, we expect the candidate passwords to be (mostly or fully) provided by the master system (where John the Ripper is running on a CPU), whereas with descrypt they will have to be mostly generated by FPGAs themselves (but with some control from the master). These two hash types themselves also need to be implemented totally differently (although free implementations for both already exist, as shown above).
Specifically in GSoC 2015, we intend to act as an umbrella organization for radare reverse-engineering framework, so their ideas are included in here by reference. The expectation is that we'll end up accepting one student for a radare task.
If you're interested in applying for a radare task under Openwall's umbrella for GSoC, please contact us via e-mail at radare at openwall.com, which reaches both of our projects at once. To contact radare only, such as if you're not eligible for GSoC anyway but want to participate in radare project's own Radare Summer of Code (RSoC), please e-mail them at rsoc at radare.org. Finally, to contact Openwall only, such as for a project not involving radare, please see our application template page.
Your own creative and relevant idea - please propose it to us first, then describe it again when you apply.
With few exceptions (such as for changes to existing Linux kernel code, which is under GPLv2 anyway), we require any contributed code to be made available under a cut-down BSD license. The wiki page linked from here is for JtR, but we'd like to use this approach for most other projects as well. By applying to work on one of the ideas with us, you signify your acceptance of these terms and your intent to license your code contributions accordingly.
This approach permits us to combine contributed code with differently-licensed third-party code, and it does not lock us to a specific Open Source license for our releases.
Additionally, it permits us to create and sell proprietary revisions of our programs, which we're currently doing with JtR Pro. As you can see from the feature sets of free JtR vs. JtR Pro, we're not abusing this ability to artificially cripple our free software. The free JtR remains the main one, where features get implemented first, with “Pro” being branched off some free versions for those users who prefer a pre-packaged “product”. Overall, the introduction of JtR Pro has helped development of the free JtR so far, by letting us spend more time on the project (vs. doing more client-facing work on other projects). We assume that students applying for work on JtR are comfortable with this.
We have a separate wiki page with archived ideas (from prior years).