yescrypt user community resources

yescrypt is a password-based key derivation function (KDF) and password hashing scheme. It builds upon Colin Percival's scrypt. This implementation is able to compute native yescrypt hashes as well as classic scrypt.

Right now, this is primarily a copy of the content from the PHC wiki, but it should probably be moved to a sub-page and this page reused for links to resources related to yescrypt.

Presentations

Discussion threads

yescrypt description from the PHC wiki

Strengths summary

  • Builds upon scrypt, computes classic scrypt and native yescrypt hashes
  • High flexibility and large arsenal of defenses
    • One password hashing scheme for many possible use cases that deals with many kinds of attacks
  • Scalable from kilobytes (RAM) to terabytes (RAM + ROM) and beyond
    • while providing adequate (or better) security vs. alternatives for same defender's {time, memory} cost
  • Scalable to arbitrary SIMD vector width and instruction-level parallelism
  • Optional TMTO resistance
    • When enabled, additionally makes attacks' area-time a few times higher
  • Optional bcrypt-like GPU unfriendliness (especially important at low memory usage settings)
  • Optional multiplication latency hardening (efficient at least on common x86 and ARM CPUs)
  • Running time optimally tunable separately from memory usage and parallelism
  • Client-side computation of almost final yescrypt hashes (server relief)
    • in a way allowing for a straightforward extension of SCRAM (RFC 5802)
  • Hash upgrades to higher cost settings without knowledge of passwords
  • Cryptographic security is based on that of SHA-256, HMAC, and PBKDF2
    • The rest of processing may be considered non-cryptographic
    • Known unfortunate peculiarities of HMAC and PBKDF2 are fully avoided

Strengths in detail

Builds upon scrypt

  • Directly builds upon the well-studied scrypt
  • Can compute classic scrypt hashes (for compatibility or when scrypt's properties suit a given use case well), sharing much of the code with yescrypt native hash support

SMix improvements

Read-only lookup table (ROM)
  • Optional ROM allows for scalability from kilobytes (RAM) 1) to terabytes (RAM + ROM) and beyond
    • while providing adequate (or better) security vs. alternatives for same defender's {time, memory} cost
  • Can be arbitrarily large regardless of running time
    • This differs from scrypt, where memory usage for a given running time is limited, so scrypt's memory usage might not be high enough when high request rate capacity is desired (scrypt's recommended minimum of 16+ MB may be hard to achieve at thousands of requests per second)
  • Optional time-memory trade-off (TMTO) resistance
    • Strongly recommended for most use cases, enabled with the YESCRYPT_RW flag
    • Can be disabled (by not setting the flag) when TMTO is expected to be needed by a defender (e.g., for multi-gigabyte key derivation that might need to be repeated later on a small mobile device)
  • TMTO resistance (when enabled) additionally increases attacker's area-time cost by a factor of 2x to 4x relative to scrypt
  • TMTO resistance (when enabled) additionally results in the highest normalized area-time cost for attacks being reached at 4/3*N combined iterations of SMix's two loops, vs. scrypt's 2*N - or at 2/3 of scrypt's SMix iteration count
    • Allows for usage of a higher N and thus for higher area-time cost to attacker
Tunable running time
  • Running time tunable separately from memory usage and parallelism
    • Provides an up to 2x improvement in area-time cost over the workaround (normally suggested for scrypt) of using scrypt's p to increase running time at same memory usage
Thread-level parallelism
  • Thread-level parallelism may optionally be made deliberately inflexible (not allowing for efficient sequential computation in less memory), enabled along with the YESCRYPT_RW flag

BlockMix improvements

The revised BlockMix, which uses mostly yescrypt's pwxform (parallel wide transformation) instead of scrypt's Salsa20/8, is enabled along with the YESCRYPT_RW flag.

  • Scalable to arbitrary SIMD vector width and instruction-level parallelism
    • To be configurable via compile-time settings, or chosen at runtime from common presets
    • Current defaults are for 4x 128-bit SIMD lanes (512 bits)
    • scrypt's p could be brought down to instruction level, but it would be sub-optimal
  • bcrypt-like GPU unfriendliness (especially important at low memory usage settings)
    • by including bcrypt-like rapid random reads, typically from defender CPU's L1 data cache
  • Multiplication latency hardening (efficient at least on common x86 and ARM CPUs)
    • (Integer) multiplication takes about as much time on common CPUs as it does in ASIC, limiting possible attack speedups (for same memory usage) even with much faster memory
    • This differs from scrypt's Salsa20/8, which is optimally computed in a lot fewer clock cycles in ASIC than on CPU
    • 32×32 to 64-bit: [I]MUL on x86[-64], [V]PMULUDQ with SSE2/AVX/AVX2; UMLAL [or VMLAL] on ARM [with NEON]

Catena-like features

  • Client-side computation of almost final yescrypt hashes (server relief)
    • in a way allowing for a straightforward extension of SCRAM (RFC 5802)
  • Hash upgrades to higher cost settings without knowledge of passwords
    • supporting over 60% area-time efficiency, as opposed to Catena's over 33%

Cryptographic security

  • Cryptographic security (collision resistance, preimage and second preimage resistance) is based on that of SHA-256, HMAC, and PBKDF2
  • The rest of processing, while crucial for increasing the cost of password cracking attacks, may be considered non-cryptographic
    • There are entropy bypasses to final PBKDF2 step for both password and salt inputs
    • For comparison, scrypt has such entropy bypass for the password input only
  • Known unfortunate peculiarities of HMAC and PBKDF2 are fully avoided in the way these primitives are used

Drawbacks

  • Just one major drawback: complexity!
    • scrypt was already pretty complicated, and the full yescrypt is more so
    • The complexity is arguably justified by yescrypt's flexibility and its arsenal of defenses: it's one password hashing scheme for many possible use cases that deals with many kinds of attacks
    • The alternative would have been defining multiple simpler schemes, but their total complexity would probably be even greater
    • Cut-down yet compatible implementation with partial functionality is planned
  • While pwxform's default 4x 128-bit SIMD parallelism fits SIMD-enabled builds running on modern CPUs well, it is excessive for older CPUs and non-SIMD builds, making such slower builds of yescrypt weaker than similar builds of bcrypt in terms of GPU attack resistance (when yescrypt's memory usage is low)
    • Using much lower parallelism settings when appropriate should address this, but the risk of people (inadvertently?) using a high SIMD- and instruction-level parallelism mode in a build not suitable for that will remain
  • Mostly susceptible to cache-timing side-channels, with only a small initial portion of computation being cache-timing resistant

Pseudocode

This pseudocode illustrates yescrypt running with p=1 (no thread-level parallelism), g=0 (no hash upgrades performed yet), flags=YESCRYPT_RW (full native mode), and no ROM.

# ** Functions/symbols **
# ||                           Concatenate two strings
# HMAC(h, k, v)                HMAC with hash function h and key k over value v
# PBKDF2(prf, p, s, c, dklen)  PBKDF2 key derivation function
# substr(s, start, length)     Substring from start (zero-based) of length bytes
# le32dec(), le32enc()         32-bit little-endian decoding/encoding
# SIMD_[un]shuffle()           Salsa20 SIMD (un)shuffling of 32-bit words
# Integerify(B, r)             Parse B_{2r-1} as a little-endian integer
# p2floor(x)                   Largest power of 2 not greater than x
 
# ** Inputs **
string   password
string   salt
integer  t_cost
integer  m_cost
integer  outlen
 
# ** Algorithm **
N = 8 << m_cost
r = 8
 
# ** Settings hard-coded/assumed below in this pseudocode **
# p = 1
# g = 0
# flags = YESCRYPT_RW
# no ROM
 
# If m_cost is 16 MB per thread or more, pre-hash using 1/64th of m_cost first,
# to mitigate garbage collector attacks.  yescrypt_prehash() is almost the same
# as this function, but its personalization HMAC key is "yescrypt-prehash"
# rather than "yescrypt", it skips builtin SCRAM finalization, and it will not
# invoke another yescrypt_prehash().
if (N / p >= 0x100 && N / p * r >= 0x20000)
    password = yescrypt_prehash(password, salt, t_cost, m_cost / 64, 32)
 
password = HMAC(SHA-256, "yescrypt", password)
B = PBKDF2(HMAC-SHA-256, password, salt, 1, 128 * r)
password = substr(B, 0, 32)
 
# SMix1 invoked to initialize pwxform S-boxes
X = SIMD_shuffle(le32dec(B))
for i = 0 to Sbytes/128 - 1
    S[i] = X
    X = BlockMix_{Salsa20/8, 1}(X)
 
# SMix1 invoked "for real"
for i = 0 to N - 1
    V[i] = X
    if (i > 1)
        j = Wrap(Integerify(X, r), i)
        X = X xor V[j]
    X = BlockMix_pwxform{Salsa20/2, S, r}(X)
 
# SMix2
if (t_cost = 0)
    Nloop = (N + 2) / 3
else if (t_cost = 1)
    Nloop = (N * 2 + 2) / 3
else
    Nloop = N * (t - 1)
for i = 0 to Nloop - 1
    j = Integerify(X, r) mod N
    X = X xor V[j]
    V[j] = X
    X = BlockMix_pwxform{Salsa20/2, S, r}(X)
B = le32enc(SIMD_unshuffle(X))
 
out = PBKDF2(HMAC-SHA-256, password, B, 1, outlen)
 
# Builtin SCRAM (RFC 5802) support
clen = min(outlen, 32)
dk = PBKDF2(HMAC-SHA-256, password, B, 1, 32)
dk = SHA-256(HMAC(SHA-256, dk, "Client Key"))
out = substr(dk, 0, clen) || substr(out, clen, outlen - clen)
 
return out
 
# ** Helper functions **
 
# Wrap x to the range 0 to i-1
Wrap(x, i)
    n = p2floor(i)
    return (x mod n) + (i - n)

The BlockMix_{Salsa20/8, r}(X) function is exactly the same as in scrypt (and is invoked with hard-coded r=1 here, regardless of yescrypt's r). The BlockMix_pwxform{Salsa20/2, S, r}(X) function is yescrypt's own, and it uses yescrypt's own pwxform in place of most uses of Salsa20/2. Please refer to the yescrypt specification document and the deliberately mostly not optimized reference implementation (yescrypt-ref.c) for how these functions, as well as the SIMD (un)shuffling, are specified.

1) Also possible due to yescrypt's bcrypt-like GPU unfriendliness in BlockMix, enabled along with YESCRYPT_RW
yescrypt.txt · Last modified: 2018/04/05 17:28 by solar
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Noncommercial-Share Alike 3.0 Unported
Recent changes RSS feed Donate to DokuWiki Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki Powered by OpenVZ Powered by Openwall GNU/*/Linux