web3blockchain

What is a Hash?

By Johannes Hayer
Picture of the author
Published on

A Hash function is simply a ** function that maps an input X of arbitrary length to an output h(x) of fixed length n**.

” The input to a hash function is called a pre-image, the message, or simply the input data. The output is called the hash" --Mastering Ethereum, by Andreas M. Antonopoulos, Gavin Wood

So in simple words, no matter how long / short your input is, a hash function will always give you a fixed sized result.

A Hash function has two properties:

  1. Compression (described above)
  2. It is easy to compute. Given h and x it is easy to compute h(x)
  • Side note: A hash is also called the digital fingerprint of some data because it represents the data uniquely. If we modified the data, another hash would be computed, and therefore we can securely say, that our data has been modified --> the fingerprint has changed*

What is a cryptographic hash?

In order to become a cryptographic hash function, our hash function needs to fulfill more properties. Let's go through them.

1) Pre-image resistance

"If a function h is a one-way-function, then a function h-1 does not exist. It is therefore computationally infeasible to find the input to an output of h" -- Gallersdörfer, U., Holl, P., & Matthes, F. (2020). "Blockchain-based Systems Engineering". Lecture Slides

This means that there is no function to get the input value from an existing hash value (therefore it is called one-way, from the input to the hash but not the other way around)

2) 2nd Pre-image resistance

"Given x it is computationally infeasible to find any second input x’ with x != x’ such that h(x) = h(x’)." -- Gallersdörfer, U., Holl, P., & Matthes, F. (2020). "Blockchain-based Systems Engineering". Lecture Slides

This means if you have the input and the hash its is very hard to find a different input that produces the same hash.

2) Collision Resistance

"A hash function h is said to be collision resistant if it is infeasible to find two values, x and y, such that x != y, yet h(x) = h(y)." -- Gallersdörfer, U., Holl, P., & Matthes, F. (2020). "Blockchain-based Systems Engineering". Lecture Slides

With collision resistance, our hash function makes sure that no two different inputs of arbitrarily length, procudes the same hash.

I know what you are thinking, is this not the same as 2nd pre-image resistance I also thought about this and found a very good explanation for this 👇

Collision resistance always implies property second preimage resistance but does not imply preimage resistance. The properties of second preimage resistance and collision resistance may seem similar but the difference is that in the case of second preimage resistance, the attacker is given a message to start with, but for collision resistance no message is given; it is simply up to the attacker to find any two messages that yield the same hash value. --Merkle-Damgård Construction Method and Alternatives: A Review. By Harshvardhan Tiwari

hashing != encryption

To make this clear, hashing is not encryption. With encryption algorithms, we encrypt some sensitive information but with the goal of de-crypt in the future, that is, with hashing functions not possible (remember one-way property)

Use cases for Hashing

• Data fingerprinting • Message integrity (error detection) • Authentication • Hiding information • Commitment of information ( a vote for example)

Your public Ethereum address is actually also a hash

Did you know that your public ethereum address is also a hash? Here is a simple flow of how Ethereum creates a public address.

  1. We start with randomly generated number k, we use the elliptic curve multiplication to create an private key (k) and to derive the public key (K) from it.

  2. We use the Keccak-256 to calculate the hash of the public key Keccak256(K) = 2a5bc342ed616b5ba5732269001d3f1ef827552ae1114027bd3ecf1f086ba0f9

  3. We keep the last 20 bytes, which is then our Ethereum address 001d3f1ef827552ae1114027bd3ecf1f086ba0f9

Often a 0x is added in front of it, to make it clear that the address is a hexadecimal number

Summary

  • Hash functions maps an input to a fixed output
  • If we add these 3 properties ( preimage | 2nd image resistance | collision resistance) to a hash function, we get an cryptographic hash function
  • Hashes are useful (even before blockchain) for many things like message integrity and hiding of information
  • Your public Ethereum address is a result of a keccak-256 hash function

Resources 📚

  • Lecture Slides from TUM Gallersdörfer, U., Holl, P., & Matthes, F. (2020). "Blockchain-based Systems Engineering"
  • Mastering Ethereum, by Andreas M. Antonopoulos, Gavin Wood

Stay Tuned

Subscribe for development and indie hacking tips!