# SHA-1

In early 2005, Rijmen and Oswald published an attack on a reduced version of SHA-1 — 53 out of 80 rounds — which finds collisions with a computational effort of fewer than 2^{80} operations.

In February 2005, an attack by Xiaoyun Wang, Yiqun Lisa Yin, and Hongbo Yu was announced. The authors have presented a collision for 58-round SHA-1, found with 2^{33} hash operations. The paper with the full attack description was published in August 2005 at the CRYPTO conference.

In an interview, Yin states that, "Roughly, we exploit the following two weaknesses: One is that the file preprocessing step is not complicated enough; another is that certain math operations in the first 20 rounds have unexpected security problems."

On 17 August 2005, an improvement on the SHA-1 attack was announced on behalf of Xiaoyun Wang, Andrew Yao and Frances Yao at the CRYPTO 2005 Rump Session, lowering the complexity required for finding a collision in SHA-1 to 2^{63}. On 18 December 2007 the details of this result were explained and verified by Martin Cochran.

Christophe De Cannière and Christian Rechberger further improved the attack on SHA-1 in "Finding SHA-1 Characteristics: General Results and Applications," receiving the Best Paper Award at ASIACRYPT 2006. A two-block collision for 64-round SHA-1 was presented, found using unoptimized methods with 2^{35} compression function evaluations. Since this attack requires the equivalent of about 2^{35} evaluations, it is considered to be a significant theoretical break. Their attack was extended further to 73 rounds (of 80) in 2010 by Grechnikov. In order to find an actual collision in the full 80 rounds of the hash function, however, tremendous amounts of computer time are required. To that end, a collision search for SHA-1 using the distributed computing platform BOINC began August 8, 2007, organized by the Graz University of Technology. The effort was abandoned May 12, 2009 due to lack of progress.

At the Rump Session of CRYPTO 2006, Christian Rechberger and Christophe De Cannière claimed to have discovered a collision attack on SHA-1 that would allow an attacker to select at least parts of the message.

In 2008, an attack methodology by Stéphane Manuel reported hash collisions with an estimated theoretical complexity of 2^{51} to 2^{57} operations. However he later retracted that claim after finding that local collision paths were not actually independent, and finally quoting for the most efficient a collision vector that was already known before this work.

Cameron McDonald, Philip Hawkes and Josef Pieprzyk presented a hash collision attack with claimed complexity 2^{52} at the Rump Session of Eurocrypt 2009. However, the accompanying paper, "Differential Path for SHA-1 with complexity *O*(2^{52})" has been withdrawn due to the authors' discovery that their estimate was incorrect.

One attack against SHA-1 was Marc Stevens with an estimated cost of $2.77M to break a single hash value by renting CPU power from cloud servers. Stevens developed this attack in a project called HashClash, implementing a differential path attack. On 8 November 2010, he claimed he had a fully working near-collision attack against full SHA-1 working with an estimated complexity equivalent to 2^{57.5} SHA-1 compressions. He estimated this attack could be extended to a full collision with a complexity around 2^{61}.

## Contents

#### The SHAppening[edit]

On , Marc Stevens, Pierre Karpman, and Thomas Peyrin published a freestart collision attack on SHA-1's compression function that requires only 2^{57} SHA-1 evaluations. This does not directly translate into a collision on the full SHA-1 hash function (where an attacker is *not* able to freely choose the initial internal state), but undermines the security claims for SHA-1. In particular, it was the first time that an attack on full SHA-1 had been *demonstrated*; all earlier attacks were too expensive for their authors to carry them out. The authors named this significant breakthrough in the cryptanalysis of SHA-1 *The SHAppening*.

#### SHAttered – first public collision[edit]

On , Google announced the *SHAttered* attack, in which they generated two different PDF files with the same SHA-1 hash in roughly 2^{63.1} SHA-1 evaluations. This attack is about 100,000 times faster than brute forcing a SHA-1 collision with a birthday attack, which was estimated to take 2^{80} SHA-1 evaluations. The attack required "the equivalent processing power as 6,500 years of single-CPU computations and 110 years of single-GPU computations".

In 2004, Biham and Chen found near-collisions for SHA-0 — two messages that hash to nearly the same value; in this case, 142 out of the 160 bits are equal. They also found full collisions of SHA-0 reduced to 62 out of its 80 rounds.

Subsequently, on 12 August 2004, a collision for the full SHA-0 algorithm was announced by Joux, Carribault, Lemuet, and Jalby. This was done by using a generalization of the Chabaud and Joux attack. Finding the collision had complexity 2^{51} and took about 80,000 processor-hours on a supercomputer with 256 Itanium 2 processors (equivalent to 13 days of full-time use of the computer).

On 17 August 2004, at the Rump Session of CRYPTO 2004, preliminary results were announced by Wang, Feng, Lai, and Yu, about an attack on MD5, SHA-0 and other hash functions. The complexity of their attack on SHA-0 is 2^{40}, significantly better than the attack by Joux *et al.*

In February 2005, an attack by Xiaoyun Wang, Yiqun Lisa Yin, and Hongbo Yu was announced which could find collisions in SHA-0 in 2^{39} operations.

Another attack in 2008 applying the boomerang attack brought the complexity of finding collisions down to 2^{33.6}, which is estimated to take 1 hour on an average PC.

In light of the results for SHA-0, some experts suggested that plans for the use of SHA-1 in new cryptosystems should be reconsidered. After the CRYPTO 2004 results were published, NIST announced that they planned to phase out the use of SHA-1 by 2010 in favor of the SHA-2 variants.

### Official validation[edit]

Implementations of all FIPS-approved security functions can be officially validated through the CMVP program, jointly run by the National Institute of Standards and Technology (NIST) and the Communications Security Establishment (CSE). For informal verification, a package to generate a high number of test vectors is made available for download on the NIST site; the resulting verification, however, does not replace the formal CMVP validation, which is required by law for certain applications.

, there are over 2000 validated implementations of SHA-1, with 14 of them capable of handling messages with a length in bits not a multiple of eight (see SHS Validation List).

## Examples and pseudocode[edit]

### Example hashes[edit]

These are examples of SHA-1 message digests in hexadecimal and in Base64 binary to ASCII text encoding.

SHA1("The quick brown fox jumps over the lazy og") gives hexadecimal: 2fd4e1c67a2d28fced849ee1bb76e7391b93eb12 gives Base64 binary to ASCII text encoding: L9ThxnotKPzthJ7hu3bnORuT6xI=

Even a small change in the message will, with overwhelming probability, result in many bits changing due to the avalanche effect. For example, changing `dog`

to `cog`

produces a hash with different values for 81 of the 160 bits:

SHA1("The quick brown fox jumps over the lazy og") gives hexadecimal: de9f2c7fd25e1b3afad3e85a0bd17d9b100db4b3 gives Base64 binary to ASCII text encoding: 3p8sf9JeGzr60+haC9F9mxANtLM=

The hash of the zero-length string is:

SHA1("") gives hexadecimal: da39a3ee5e6b4b0d3255bfef95601890afd80709 gives Base64 binary to ASCII text encoding: 2jmj7l5rSw0yVb/vlWAYkK/YBwk=

### SHA-1 pseudocode[edit]

Pseudocode for the SHA-1 algorithm follows:

Note 1: All variables are unsigned 32-bit quantities and wrap modulo 2^{32}when calculating, except forml, the message length, which is a 64-bit quantity, andhh, the message digest, which is a 160-bit quantity.Note 2: All constants in this pseudo code are in big endian.Within each word, the most significant byte is stored in the leftmost byte positionInitialize variables:h0 = 0x67452301 h1 = 0xEFCDAB89 h2 = 0x98BADCFE h3 = 0x10325476 h4 = 0xC3D2E1F0 ml = message length in bits (always a multiple of the number of bits in a character).Pre-processing:append the bit '1' to the message e.g. by adding 0x80 if message length is a multiple of 8 bits. append 0 ≤ k < 512 bits '0', such that the resulting message length inbitsis congruent to −64 ≡ 448 (mod 512) append ml, the original message length, as a 64-bit big-endian integer. Thus, the total length is a multiple of 512 bits.Process the message in successive 512-bit chunks:break message into 512-bit chunksforeach chunk break chunk into sixteen 32-bit big-endian words w[i], 0 ≤ i ≤ 15Extend the sixteen 32-bit words into eighty 32-bit words:forifrom16 to 79 w[i] = (w[i-3]xorw[i-8]xorw[i-14]xorw[i-16])leftrotate1Initialize hash value for this chunk:a = h0 b = h1 c = h2 d = h3 e = h4Main loop:forifrom0to79if0 ≤ i ≤ 19thenf = (bandc)or((notb)andd) k = 0x5A827999else if20 ≤ i ≤ 39 f = bxorcxord k = 0x6ED9EBA1else if40 ≤ i ≤ 59 f = (bandc)or(bandd)or(candd) k = 0x8F1BBCDCelse if60 ≤ i ≤ 79 f = bxorcxord k = 0xCA62C1D6 temp = (aleftrotate5) + f + e + k + w[i] e = d d = c c = bleftrotate30 b = a a = tempAdd this chunk's hash to result so far:h0 = h0 + a h1 = h1 + b h2 = h2 + c h3 = h3 + d h4 = h4 + eProduce the final hash value (big-endian) as a 160-bit number:hh = (h0leftshift128)or(h1leftshift96)or(h2leftshift64)or(h3leftshift32)orh4

The number `hh`

is the message digest, which can be written in hexadecimal (base 16), but is often written using Base64 binary to ASCII text encoding.

The constant values used are chosen to be nothing up my sleeve numbers: The four round constants `k`

are 2^{30} times the square roots of 2, 3, 5 and 10. The first four starting values for `h0`

through `h3`

are the same with the MD5 algorithm, and the fifth (for `h4`

) is similar.

Instead of the formulation from the original FIPS PUB 180-1 shown, the following equivalent expressions may be used to compute `f`

in the main loop above:

Bitwise choice betweencandd, controlled byb.(0 ≤ i ≤ 19): f = dxor(band(cxord))(alternative 1)(0 ≤ i ≤ 19): f = (bandc)xor((notb)andd)(alternative 2)(0 ≤ i ≤ 19): f = (bandc) + ((notb)andd)(alternative 3)(0 ≤ i ≤ 19): f = vec_sel(d, c, b)(alternative 4)Bitwise majority function.(40 ≤ i ≤ 59): f = (bandc)or(dand(borc))(alternative 1)(40 ≤ i ≤ 59): f = (bandc)or(dand(bxorc))(alternative 2)(40 ≤ i ≤ 59): f = (bandc) + (dand(bxorc))(alternative 3)(40 ≤ i ≤ 59): f = (bandc)xor(bandd)xor(candd)(alternative 4)(40 ≤ i ≤ 59): f = vec_sel(c, b, cxord)(alternative 5)

It was also shown that for the rounds 32–79 the computation of:

w[i] = (w[i-3]xorw[i-8]xorw[i-14]xorw[i-16])leftrotate1

can be replaced with:

w[i] = (w[i-6]xorw[i-16]xorw[i-28]xorw[i-32])leftrotate2

This transformation keeps all operands 64-bit aligned and, by removing the dependency of `w[i]`

on `w[i-3]`

, allows efficient SIMD implementation with a vector length of 4 like x86 SSE instructions.

## Comparison of SHA functions[edit]

In the table below, *internal state* means the "internal hash sum" after each compression of a data block.

Note that performance will vary not only between algorithms, but also with the specific implementation and hardware used. The OpenSSL tool has a built-in "speed" command that benchmarks the various algorithms on the user's system.

## See Also on BitcoinWiki[edit]

- Collision (computer science)
- Comparison of cryptographic hash functions
- Crypto++
- Hash function security summary
- Hashcash
- RIPEMD
- sha1sum
- Tiger (cryptography)
- Whirlpool (cryptography)

## Source[edit]