Worldwide Achievements in Collision Detection

We are not the first or the only ones who are seriously looking for collisions in HASH functions. We constantly monitor the achievements of significant results in research on hash function vulnerabilities and interesting attempts to find collisions in open sources.

In order to familiarize yourself with the research on the topic of collision detection, here are several links to resources dedicated to this topic:

Exploring SAT for Cryptanalysis: (Quantum) Collision Attacks Against 6-Round SHA-3

Practical Collision Attacks against Round-Reduced SHA-3

Quantum Collision Attacks on Reduced SHA-256 and SHA-512

The First Practical Collision for 31-Step SHA-256

Analysis of SHA-512/224 and SHA-512/256

Open source software that allows you to search for collisions in HASH functions:

Ensuring there are no collisions among SHA256 hashes of simple numeric strings

Sha256collision

New Results in SHA-256 Collision Search

In March 2024, there was quite a lot written about a new attack on SHA-256, “with collision detection”. In general, here we need to separate academic attacks from practice, so to speak, all sorts of bitcoins: in the original work (make a link to download the file with the article) (“New Records in Collision Attacks on SHA-2”, Li, Liu, Wang) it is about academic attacks on a truncated “compression function” (see below) from SHA-2 (SHA-256 is a type of SHA-2), and even with a specially selected previous state. This is a generally accepted approach to studying the resistance of hash functions, the result is significantly better than previous achievements, but it should be taken into account that it does not necessarily lead to a practical attack on the full hash function. The article specifically talks about a “practical attack”, this is true, but this is a different “practice”.

Let's take SHA-256 as an example and one of the practical (in the academic sense) attacks presented in the above-mentioned work. The core of the SHA-256 hash function scheme is transformations corresponding to a symmetric cipher. Repeatedly performed rounds of cipher transformations, inside the hash function, form the so-called "compression function" - this is the most important element. The input text for transformation in SHA-256 is divided into blocks. A block is a sequence of bytes corresponding to the bit depth of the function, here - 256 bits or 32 bytes. Blocks are processed by the compression function, inside which a specified number of rounds are performed - that is, the repeated application of the same transformations, in a cycle, to the processed block. Further, we will talk only about one block, and not about the full input text.

After each round, the block is modified by the transformations applied and then subjected to the same transformations again, which again modify the block. This combinatorial part allows us to achieve the necessary sparsity for mapping input blocks to output values. The standard SHA-256 scheme uses 64 rounds in the compression function. The attack in question works for 39 rounds (note: with a selected initial state - this is a very important point).

What does this mean? It means that the researchers found and presented a tuple of three specific values (numbers or byte arrays, whatever you like) that, when substituted into a truncated version of the SHA-256 hash function for 39 rounds of compression, yield the same result. One of these values is the initial state set before calling the compression function inside the truncated SHA-256. That is, with a standard implementation of SHA-256, this state would either be the previously processed block or the initial constants from the SHA-256 specification. The other two values mentioned are input blocks that differ in some bytes. Processing these blocks under the specified, very strict, conditions yields a collision. That is, an academic “practical” demonstration, with specific numbers. This is a completely strict and reasonable way to analyze the stability of transformations. In this science, this is called an SFS collision (from Semi-Free-Start). But, again, it's a long way from demonstrating a real, practical, "killing" SHA-256 collision, i.e., demonstrating two different input texts that produce the same hash function result. (Which, of course, doesn't negate the significant progress.)

How many rounds are important for the compression function? All rounds specified in the specification are important. Each round is very important and blocks attacks that are successful for the reduced variants. This is what rounds are for. Because if the hash value changes even just as a result of the last round, then the output will still be different and, strictly speaking, there is no collision (but there may be a “near collision”, etc.).

Naturally, finding coincidence points for a reduced number of rounds often provides tools for breaking the entire hash function. That is why academic attacks on primitives with a reduced number of rounds are a vital tool. However, is the fact that a collision is found for more than half of the number of rounds an automatic guarantee of successfully applying the same method to, say, half of the remaining half? No, it is not at all. The methods here develop, so to speak, completely “nonlinearly”, so that an insurmountable computational obstacle can arise even at each subsequent round - it will be necessary to completely rework the attack method. In fact, this next improvement of attacks on SHA-2 is built on new methods, if compared with the methods that were used for attacks on MD5, etc.

A concrete example taken from the original paper: for 39 rounds of SHA-256, given an initial state, the resulting values are the same for different input blocks (output from the program included with the paper):
431cadcd ce6893bb d6c9689a 334854e8 3baae1ab 038a195a ccf54a19 1c40606d
431cadcd ce6893bb d6c9689a 334854e8 3baae1ab 038a195a ccf54a19 1c40606d

One might assume that once the point of coincidence is found, further rounds will give the same results. But this is far from true – the state of the hash function is wider than the specific result of the block transformation. So, if we specify 42 rounds in the same program (only three rounds more), the values will already diverge noticeably:
d33c0cff 17c9de13 21f528f0 3362e217 563f1779 521a1b9c df062e86 19fb5973
105d6c34 43ceb0ad 120ba1a0 3362e217 d6dd86e0 7da567b5 cf1ca736 19fb5973

This, again, does not diminish the value of the result for 39 rounds, but it shows that for full SHA-256 everything is – for now – good.

Cryptographic hash functions map messages of arbitrary length to blocks of fixed length, so, mathematically, collisions are there by definition. Another thing is that if a hash function maps a set of input texts well to, say, 2^256 output values, then there is no need to worry about collisions “by exhaustive search”: few people can create and write even 2^128 texts. Attacks with detection of coincidence points in the compression function, potentially, reveal mapping defects that can allow collisions to be found using exhaustive search. And it is possible that they will allow finding ways to solve the much more complex problem of calculating the preimage, that is, selecting the input text by the value of the hash function.

A method for detecting collisions in SHA-1, suitable for attacking PGP, is proposed

Researchers from the French National Institute for Research in Informatics and Automation (INRIA) and Nanyang Technological University (Singapore) have presented the Shambles attack (PDF) (download paper), which is touted as the first practical implementation of an attack on the SHA-1 algorithm that can be used to create fake PGP and GnuPG digital signatures. The researchers believe that all practical attacks on MD5 can now be applied to SHA-1, although they still require significant resources to implement.

The method is based on a collision attack with a given prefix, which allows for two arbitrary data sets to select additions, the attachment of which will result in collision-causing sets, for which the application of the SHA-1 algorithm will lead to the formation of the same resulting hash. In other words, for two existing documents, two additions can be calculated, and if one is attached to the first document, and the other to the second, the resulting SHA-1 hashes for these files will be the same.

The new method differs from previously proposed similar techniques by improving the efficiency of collision search and demonstrating practical application for attacking PGP. Specifically, the researchers were able to prepare two public PGP keys of different sizes (RSA-8192 and RSA-6144) with different user IDs and with certificates that caused an SHA-1 collision. The first key included the victim's ID, and the second key included the name and image of the attacker. Moreover, due to the collision selection, the certificate identifying the keys, including the key and image of the attacker, had the same SHA-1 hash as the identification certificate, including the key and name of the victim.

The attacker could request a digital signature for their key and image from a third-party certification authority, and then transfer the digital signature for the victim's key. The digital signature remains valid due to the collision and certification of the attacker's key by the certification authority, which allows the attacker to gain control over the key with the victim's name (since the SHA-1 hash for both keys is the same). As a result, the attacker can impersonate the victim and sign any document on their behalf.

The attack is still quite expensive, but is already quite affordable for intelligence agencies and large corporations. For a simple collision selection using a cheaper NVIDIA GTX 970 GPU, the costs were 11 thousand dollars, and for a collision selection with a given prefix - 45 thousand dollars (for comparison, in 2012, the cost of selecting a collision in SHA-1 was estimated at 2 million dollars, and in 2015 - 700 thousand). To carry out a practical attack on PGP, two months of calculations were required using 900 NVIDIA GTX 1060 GPUs, the rent of which cost the researchers 75 thousand dollars.

The method proposed by the researchers for detecting collisions is approximately 10 times more efficient than previous achievements - the level of complexity of collision calculations was reduced to 2^61.2 operations instead of 2^64.7, and collisions with a given prefix to 2^63.4 operations instead of 2^67.1. The researchers recommend switching from SHA-1 to SHA-256 or SHA-3 as soon as possible; in 2025, the cost of an attack dropped to $10,000.

The GnuPG developers were notified of the issue on October 1 (CVE-2019-14855) and took action to block the affected certificates in GnuPG 2.2.18 on November 25 — all SHA-1 digital signatures created after January 19 last year are now considered invalid. CAcert, one of the main certificate authorities for PGP keys, plans to switch to using more secure hash functions for certifying keys. In response to information about the new attack method, the OpenSSL developers decided to disable SHA-1 in the default first security level (SHA-1 cannot be used for certificates and digital signatures during connection negotiation).