This is an old revision of the document!

HPC Village

HPC Village from Openwall is an opportunity for HPC (High Performance Computing) hobbyists alike to program for a heterogeneous (hybrid) HPC platform. Participants are provided with remote access (via the SSH protocol) to a server with multi-core CPUs and HPC accelerator cards of different kinds - Intel MIC (Xeon Phi), AMD GPU, NVIDIA GPU - as well as with pre-installed and configured drivers and development tools (SDKs).

We provide within one machine access to the mentioned four types of computing devices, including OpenCL support for all of them, as well as support for development tools and usage models specific to some of them (OpenMP on CPU, OpenMP offload from CPU to MIC, CUDA on NVIDIA GPU). Although it is uncommon to use more than two types of computing devices within one node in real-world HPC setups, such configuration is convenient for getting acquainted with the different technologies, for trying out and comparing them on specific tasks, and for development of portable software programs (including debugging and optimization).

Hardware

The current hardware configuration is as follows:

Supermicro GPU SuperWorkstation 7047GR-TPRF workstation/server platform with MCP-290-00059-0B rackmount rail set
- 4U chassis
- Two 1620W PSUs ¹⁾
- Dual socket 2011 motherboard with IPMI, 16 memory sockets, four PCIe 3.0 x16 slots for full-length dual-width PCIe cards and a fifth slot for a shorter card
- A full set of cooling fans, including those pulling hot air out of passively-cooled accelerator cards
Two 8-core Intel Xeon E5-2670 CPUs
- Sandy Bridge-EP, support AVX and AES-NI
- A total of 16 CPU cores seen as 32 logical CPUs (two hardware threads per core), at a clock rate of at least 2.6 GHz
- Turbo boost to up to 3.0 GHz with all cores in use or 3.3 GHz with few cores in use
128 GB DDR3-1600 ECC RAM
- 8x 16 GB DDR3-1600 ECC Registered modules on 8 channels (4 channels per CPU)
- Theoretical bandwidth 102.4 GB/s, actual measured bandwidth ~85 GB/s (cumulative from 32 threads)
Intel Xeon Phi 5110P coprocessor module
- Intel Many Integrated Core (MIC) architecture, Knights Corner
- 60 cores (x86-ish with 512-bit SIMD units) seen as 240 logical CPUs (four hardware threads per core), 1053 MHz, 8 GB GDDR5 ECC RAM on a 512-bit bus, 320 GB/s
- Peak performance of about 2 TFLOPS single-precision, 1 TFLOPS double-precision
NVIDIA GTX Titan X gaming graphics card (reference design, manufactured by Gigabyte)
- NVIDIA Maxwell architecture
- One GM200 GPU with 3072 SPs at 1000 MHz to 1076 MHz, 12 GB GDDR5 RAM on a 384-bit bus, 336 GB/s
- Peak performance of over 6 TFLOPS single-precision, 0.2 TFLOPS double-precision
NVIDIA GTX TITAN gaming graphics card (Zotac GeForce GTX TITAN AMP! Edition)
- NVIDIA Kepler architecture
- One GK110 GPU with 2688 SPs at 902 MHz to 954 MHz in single-precision mode, 6 GB GDDR5 RAM on a 384-bit bus, 317.2 GB/s
- Peak performance of over 5 TFLOPS single-precision, from 1.3 to 1.5 TFLOPS double-precision in the corresponding mode
- This is a budget replacement for the TESLA K20X GPU card intended for workstations and servers (which would cost at least 3 times more and would run considerably slower at single-precision and integer code, but would offer ECC RAM)
AMD Radeon HD 7990 gaming graphics card
- AMD GCN architecture
- Two “Tahiti” GPUs, which provides 2×2048 SPs, 6 GB GDDR5 RAM on two 384-bit buses, 576 GB/s
- Custom core clock rate: 501 MHz for GPU0 (heavily underclocked), 997.5 MHz to 1050 MHz for GPU1 (almost same as HD 7970 GE) ²⁾
- Peak performance of over 6 TFLOPS single-precision, about 1.5 TFLOPS double-precision
- This is a budget replacement for the FirePro S10000 GPU card intended for servers (which would cost at least 3 times more, but would offer ECC RAM)
AMD Radeon HD 5750/6750 gaming graphics card marketed as “PowerColor Radeon HD 6770 Green Edition (AX6770 1GBD5-HV4)”, one half of a HD 5850
- AMD TeraScale 2 (VLIW5) architecture
- One Juniper PRO GPU with 720 SPs at 700 MHz, 1 GB GDDR5 RAM on a 128-bit bus, 73.6 GB/s
- A short card that fits into this motherboard's 5th dual-width PCIe slot
- Not a high performance card, but usable for testing/benchmarking on the old VLIW5 architecture, such as to avoid performance regressions for users with older cards like this (HD 5000 and 6000 series up to and including 6870)
- Peak performance of over 1 TFLOPS single-precision

Total peak performance is over 20 TFLOPS single-precision, about 4 TFLOPS double-precision.

Pictures

Here's what the server looks like (click on the thumbnails for higher resolution pictures).

2015 upgrade (added GTX Titan X, as well as HD 6770 Green Edition into the short slot):

2013:

Software

The operating system is Scientific Linux 6.x (with several devtoolsets installed, such as providing a variety of newer GCC versions), since this is a common free option to run Intel MPSS as needed to access the Xeon Phi card (which, in turn, runs its own copy of Linux, coming from Intel MPSS).

Here's what this looks like via OpenCL:

[solar@super ~]$ clinfo | egrep '^  (Platform |)Name:' | tail -n +4
  Platform Name:                                 AMD Accelerated Parallel Processing
  Name:                                          Juniper
  Name:                                          Tahiti
  Name:                                          Tahiti
  Name:                                          Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
  Platform Name:                                 Intel(R) OpenCL
  Name:                                                 Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
  Name:                                          Intel(R) Many Integrated Core Acceleration Card
  Platform Name:                                 NVIDIA CUDA
  Name:                                          GeForce GTX TITAN X
  Name:                                          GeForce GTX TITAN

Curiously, “Tahiti” appears twice because there are two such GPUs (they're device 0 and 1, respectively), whereas the CPUs appear twice because they're available via both AMD's and Intel's OpenCL SDK, and either SDK will use all cores of both CPUs.

Additional resources

We also host a changing number of other development boxes, as of this writing including with Intel AVX2, Intel HD Graphics 4600 (with a configured and working OpenCL “driver”), AMD XOP, AMD GCN 1.1, NVIDIA Fermi, some non-x86 architectures (ARM, MIPS64, Epiphany), some FPGAs (ZedBoard with Xilinx Zynq 7020, ZTEX 1.15y with quad Spartan-6 LX150). Please feel free to inquire about availability of these or/and other resources if relevant to your project.

Who is eligible

Remote access will be provided, free of charge, to Open Source software developers. Access is provided for getting acquainted with the technologies and/or for Open Source software development. In the organizers' sole discretion, access may be denied or restricted (in particular, in case it is used for other than an intended purpose or/and if one's use of the system inconveniences other users in a substantial way). The information contained in this announcement does not formally constitute an offer to provide any service to the general public.

How to apply

To apply for an HPC Village account, please e-mail hpc-village-admin at openwall.com with the following information:

Names of and URLs to Open Source project(s) that you represent, and a way for us to confirm that you're in fact involved with those projects
Desired login name (must be non-misleading to other users)
Your SSH public key, preferably from a keypair generated according to our conventions

We intend to reply to all HPC Village accounts request e-mails.

Credits

The HPC Village project is provided by Openwall (idea, most computer hardware parts, software configuration, system administration) and DataForce (assembly and hosting of servers, Internet connectivity). NVIDIA GTX Titan X purchase was fully sponsored by Sagitta HPC, a subsidiary of Stricture Group LLC. AMD Radeon HD 7990 was team john-users' prize in Hash Runner 2013 organized by Positive Technologies.

Please note that Openwall is not affiliated with any of these.

Free access to multi-CPU servers (including some non-x86) for Open Source development:

GCC Compile Farm

Use Sage, R, Octave, Python, Cython, GAP, Macaulay2, Singular, and much more, write, compile, and run code in most programming languages on remote systems using a free or paid service (with support from University of Washington, the National Science Foundation, and Google):

CoCalc (formerly SageMathCloud)

Time-limited free access to an HPC machine, with intent to promote this vendor's computer hardware sales:

Microway test drive of up to dual Xeon E5-26xx v4 or IBM POWER8 CPUs and NVIDIA Tesla GPUs

Free access for academic researchers worldwide to a 384-node cluster with Intel Xeon CPUs and Altera Stratix V FPGAs (two CPUs and one FPGA per node), running Windows Server 2012:

Project Catapult free access as announced by TACC (Texas Advanced Computing Center at The University of Texas at Austin) and Microsoft Research

¹⁾ The system's AC power consumption at idle is around 360W. At full load on all components, it increases to almost 1600W. These are totals for the two PSUs, which are normally sharing the load.

²⁾ Normally, HD 7990 runs at 950 MHz to 1000 MHz - however, when we added a GTX Titan X to this machine in 2015, we had to underclock HD 7990's GPU0 as we no longer had a free slot near this dual-GPU card, so it was no longer able to keep its GPU0 cool when running at stock clocks. Going down all the way to 501 MHz is overkill, but it is the highest where the standard firmware would use a lower core voltage of 950mV instead of 1200mV, and this lower voltage is needed to prevent this GPU from overheating in our current setup.

Table of Contents