Differences

This shows you the differences between two versions of the page.

--- ideas-archive [2015/05/03 16:42]
solar [Low-level GPU programming] added MaxAs: Assembler for NVIDIA Maxwell architecture
+++ ideas-archive [2015/05/24 07:31] (current)
solar moved "Low-level GPU programming" content to its own wiki page
@@ Line 78: / Line 78: @@
 === Low-level GPU programming ===
-This project was worked on in GSoC 2013, but there's more to do on it.
+This project was worked on in GSoC 2013, but there's more to do on it.  Wiki page: [[john/development/GPU-low-level]]
-Starting in 2011, we've made considerable progress on adding GPU support to John the Ripper, via CUDA and OpenCL.  In the process, we've also identified limitations of these high-level approaches.  For example, for DES-based crypt(3) hashes, there's substantial performance improvement from specializing the code to a given salt value.  While we can specialize OpenCL source code and build per-salt OpenCL kernels at runtime, this takes tens of minutes for the 4096 salt values.  This delays program startup or at least the time until the programs gets to running at full speed.  For another example, for bcrypt hashes we (and two other projects) have achieved only CPU-like performance on current high-end GPUs.  While there's good explanation for that (not enough local memory to fully use the SIMD units and to hide the latencies), we're not entirely convinced that nothing better can be done by programming AMD GCN GPUs (such as the HD 7970) at a level below OpenCL - that is, at AMD IL or/and AMD GCN ISA level.  For example, to what extent is the limitation of 256 VGPRs per work-item inherent to GCN?  Can we bypass it with a non-standard programming model (e.g. have a work-item access what would normally be another work-item's VGPRs)?  ([[http://article.gmane.org/gmane.comp.security.phc/2216|Apparently not, or at least not easily.]])  Since the combined size of VGPRs per CU is 4x larger than the size of local memory per CU, yet there's support for indexed access to VGPRs, this may let us run more concurrent instances of bcrypt (up to 5x more?) and thereby achieve greater performance.
-A sub-task here is to explore ways to write lower-level GPU code, possibly with specific focus on AMD GCN or/and on NVIDIA Maxwell, and also to analyze OpenCL-generated code at a low level to identify its shortcomings.  We may also produce custom development tools, such as to allow for runtime code specialization (e.g. updating binary kernels implementing DES-based crypt(3) for specific salt values, which may be done a lot quicker than building OpenCL kernels from source).  Another sub-task is to make use of the gained knowledge and the created tools to make John the Ripper run faster.
-Other relevant pages on this wiki:
-  * [[john/development/AMD-IL|Development in AMD IL]]
-  * [[john/development/GCN-ISA|Development in AMD GCN ISA]]
-AMD documentation:
-  * [[http://developer.amd.com/wordpress/media/2012/12/AMD_Southern_Islands_Instruction_Set_Architecture.pdf|AMD GCN ISA]]
-Related third-party projects, for AMD GPUs:
-  * [[http://www.ast.cam.ac.uk/~stg20/amdstream/|Assembler for AMD HD 69xx cards]] (source code available, but no license provided)
-  * [[http://realhet.wordpress.com|Pascal + assembler + IDE for AMD GCN ISA]] (Windows, closed-source)
-    * [[http://x.pgy.hu/~worm/het/hp/|Download link for the above]] as the link currently on the blog above is broken
-    * [[http://devgurus.amd.com/thread/159954|Forum thread where this project was introduced by its author]]
-  * [[http://www.codeproject.com/Articles/872477/Assembler-for-AMD-s-GCN-GPU|GCN assembler in C#]] (Windows, C# source, gratis but not free software)
-  * [[https://github.com/sylware/cmingcnasm|cmingcnasm: C language MINimal GCN ASseMbler]] (source code available, GNU AGPLv3)
-for NVIDIA GPUs:
-  * [[http://www.openwall.com/lists/john-dev/2012/03/24/13|Usable assembly language for GPUs: a success story]] (published paper, but the qhasm-cudasm tool is not released)
-  * [[https://github.com/NervanaSystems/maxas|MaxAs: Assembler for NVIDIA Maxwell architecture]] ([[https://code.google.com/p/maxas/|old project]])
-  * [[http://code.google.com/p/asfermi/|asfermi: Assembler for NVIDIA Fermi architecture]]
-  * [[https://github.com/laanwj/decuda/wiki|Cubin Utilities (decuda and cudasm)]]
 ==== Smaller and/or new projects ====