·15 min read

What is a Hash Function? How Cryptographic Hashing Works — MD5, SHA-1, SHA-256 Explained

Every time you log into a website, download software, make a Bitcoin transaction, or commit code to Git, hash functions are working behind the scenes. These elegant one-way mathematical functions are the silent backbone of modern digital security. Let's explore how they work and why they matter.

What is a Hash Function?

A hash function is a mathematical algorithm that takes an input of any size — a single character, a sentence, or an entire file — and produces a fixed-length output called a hash value (also known as a digest, hash code, or fingerprint).

Think of it as a digital fingerprint machine: you feed in any data, and it produces a unique, fixed-length identifier. The same input always produces the same hash, but even a tiny change to the input — a single character — produces a completely different hash.

SHA-256 hashes of similar inputs:

Input: "Hello"
Hash:  185f8db32271fe25f561a6fc938b2e264306ec304eda518007d1764826381969

Input: "Hello!"  (added one character)
Hash:  334d016f755cd6dc58c53a86e183882f8ec14f52fb05345887c8a5edd42c87b7

Input: "hello"  (lowercase h)
Hash:  2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824

→ Completely different outputs from nearly identical inputs!

The critical feature is that hashing is a one-way function. Given a hash value, there is no mathematical way to reverse-engineer the original input. You can only go forward (input → hash), never backward (hash → input). This one-way property is what makes hash functions essential for security.

The Five Properties of Cryptographic Hashes

Not all hash functions are created equal. A cryptographic hash function must have these five properties to be considered secure:

1. Deterministic

The same input alwaysproduces the same hash. If you hash "Hello" with SHA-256 on any computer, anywhere in the world, at any time, you get the same 64-character hex string. This consistency is what makes verification possible.

2. Pre-image Resistance (One-Way)

Given a hash value, it must be computationally infeasible to find the original input. This means you can safely publish a hash without revealing the data it came from. For SHA-256, brute-forcing the input would take longer than the age of the universe.

3. Second Pre-image Resistance

Given an input and its hash, it must be infeasible to find a different input that produces the same hash. This prevents someone from substituting a malicious file that happens to have the same hash as a genuine one.

4. Collision Resistance

It must be infeasible to find any two different inputs that produce the same hash. This is a stronger requirement than second pre-image resistance — the attacker gets to choose both inputs. MD5 and SHA-1 have both failed this property.

5. Avalanche Effect

A tiny change in the input (even flipping a single bit) should produce a dramatically different hash — ideally, about half the output bits should change. This prevents attackers from learning anything about the input by observing how the hash changes.

How Hashing Works Under the Hood

While the mathematics behind hash functions are complex, the general process follows a consistent pattern. Here's a simplified view of how SHA-256 processes data:

SHA-256 processing pipeline (simplified):

1. PADDING
   → Append a '1' bit, then zeros, then the original message length
   → Pad to a multiple of 512 bits

2. PARSING
   → Split padded message into 512-bit blocks
   → Each block is processed sequentially

3. INITIALIZATION
   → Start with 8 fixed initial hash values (H0–H7)
   → These are derived from square roots of the first 8 primes

4. COMPRESSION (per block)
   → Expand 512-bit block into 64 "message schedule" words
   → Run 64 rounds of bitwise operations:
     - Rotate, shift, XOR, AND, modular addition
     - Mix message words with round constants (derived from cube roots of primes)
   → Add result to running hash state

5. OUTPUT
   → Concatenate final H0–H7 values
   → Result: 256-bit (32-byte) hash → 64 hex characters

The key insight is that every bit of the input eventually influences every bit of the output through many rounds of mixing. This is what creates the avalanche effect — changing one input bit triggers a cascade of changes through all 64 rounds.

Hash Algorithms Compared

Over the decades, several hash algorithms have been developed, each building on the lessons (and failures) of its predecessors:

AlgorithmYearOutputStatusUse Today
MD51991128 bits (32 hex)BrokenNon-security checksums only
SHA-11995160 bits (40 hex)DeprecatedLegacy systems (Git), being phased out
SHA-2562001256 bits (64 hex)SecureIndustry standard — TLS, Bitcoin, most uses
SHA-3842001384 bits (96 hex)SecureGovernment, military, high-security
SHA-5122001512 bits (128 hex)SecureHigh-security; faster on 64-bit CPUs
SHA-3 (Keccak)2015224–512 bitsSecureBackup standard; Ethereum (Keccak-256)

MD5 — The Broken Pioneer

Designed by Ronald Rivest in 1991, MD5 (Message Digest 5) produces a 128-bit hash. It was the workhorse of internet security for over a decade, used in SSL certificates, password storage, and file verification. However, collision attacks were demonstrated in 2004 by Xiaoyun Wang, and in 2008 researchers created a rogue SSL certificate using MD5 collisions.

Today, MD5 collisions can be generated in seconds on a laptop. MD5 should never be used for security, but remains useful for non-security checksums (e.g., checking if a file was corrupted during download, deduplication).

SHA-1 — Deprecated but Lingering

Designed by the NSA and published by NIST in 1995, SHA-1 produces a 160-bit hash. For years it was the standard for SSL/TLS certificates, code signing, and version control. But in 2017, Google and CWI Amsterdam demonstrated the first practical SHA-1 collision (SHAttered) — two different PDF files with identical SHA-1 hashes.

Major browsers stopped trusting SHA-1 SSL certificates in 2017. Git still uses SHA-1 for object addressing but is migrating to SHA-256. SHA-1 should not be used for any new security application.

SHA-256 — The Current Standard

Part of the SHA-2 family designed by the NSA and published in 2001, SHA-256 produces a 256-bit hash. It is the most widely used secure hash algorithm today:

  • Bitcoin mining — the proof-of-work algorithm runs SHA-256 twice (double SHA-256)
  • TLS/SSL certificates — virtually all HTTPS certificates use SHA-256
  • Code signing — software updates from Apple, Microsoft, and Linux distributions
  • File integrity — Linux package managers verify downloads with SHA-256
  • Digital signatures — most signing schemes use SHA-256 as the hash step

No practical attacks against SHA-256 exist. Breaking it would require 2128 operations for collisions — far beyond any computing capability conceivable today, including quantum computers in the near term.

SHA-384 & SHA-512 — Extra Security Margin

SHA-384 and SHA-512 are the larger variants of the SHA-2 family. SHA-512 uses 64-bit operations internally (vs 32-bit for SHA-256), making it actually faster than SHA-256 on 64-bit processors despite producing a longer hash. SHA-384 is a truncated version of SHA-512.

These are commonly used in government and military applications (NSA Suite B specifies SHA-384) and in situations requiring future-proofing against advances in computing power.

Collision Attacks: When Hashes Fail

A collision occurs when two different inputs produce the same hash output. Since hash functions map an infinite input space to a finite output space, collisions must theoretically exist (by the pigeonhole principle). The question is whether they can be found in practice.

AlgorithmTheoretical Collision ResistancePractical Status
MD5264 operationsBroken — collisions in seconds on a laptop
SHA-1280 operationsBroken — practical collision (SHAttered, 2017)
SHA-2562128 operationsSecure — no known shortcuts
SHA-5122256 operationsSecure — vastly beyond any foreseeable computing

The Birthday Attack

The birthday paradox means finding collisions is easier than you'd expect. For a hash with n bits, you only need roughly 2n/2 attempts (not 2n) to find a collision. This is why MD5 (128-bit) only requires about 264 operations — still a large number, but feasible with modern hardware.

Password Hashing: Salts, Stretching & Specialized Functions

One of the most important applications of hashing is password storage. Websites never store your actual password — they store a hash of it. When you log in, the site hashes your input and compares it to the stored hash.

However, using a plain hash function like SHA-256 for passwords is dangerously insecure. Here's why and what to do instead:

The Problem with Plain Hashing

Without salt — identical passwords produce identical hashes:

User Alice: SHA-256("password123") → ef92b778...
User Bob:   SHA-256("password123") → ef92b778...  ← SAME HASH!

An attacker who sees the database knows Alice and Bob
have the same password. Even worse, they can use
pre-computed "rainbow tables" to look up common passwords.

With salt — each user gets a unique random salt:

User Alice: SHA-256("a8f3k9" + "password123") → 7c2d91a3...
User Bob:   SHA-256("m2p7x4" + "password123") → b5e8f102...  ← DIFFERENT!

Same password, different hashes. Rainbow tables are useless.

Key Stretching

SHA-256 is designed to be fast — a modern GPU can compute billions of SHA-256 hashes per second, making brute-force attacks against passwords practical. Key stretching solves this by deliberately slowing down the hash computation:

FunctionYearApproachRecommendation
bcrypt1999Blowfish-based, configurable cost factorGood — widely supported, battle-tested
scrypt2009Memory-hard (resists GPU/ASIC attacks)Better — used in Litecoin, some web apps
Argon22015Memory-hard, configurable parallelismBest — winner of Password Hashing Competition

Never Use Plain SHA-256 for Passwords

SHA-256 (even with a salt) is too fast for password hashing. A modern GPU can test billions of SHA-256 hashes per second, making it feasible to crack common passwords quickly. Always use a purpose-built password hashing function like Argon2, bcrypt, or scrypt. These are deliberately slow and memory-intensive, making brute-force attacks cost-prohibitive.

HMAC: Keyed Hashing for Authentication

HMAC (Hash-based Message Authentication Code) combines a hash function with a secret key to provide both integrity and authenticity. Unlike a plain hash, an HMAC proves that the message was created by someone who possesses the secret key.

HMAC construction (simplified):

HMAC(key, message) = Hash((key ⊕ opad) || Hash((key ⊕ ipad) || message))

Where:
  opad = outer padding (0x5c repeated)
  ipad = inner padding (0x36 repeated)
  ⊕   = XOR
  ||  = concatenation

Common usage:
  HMAC-SHA256(secret_key, "GET /api/users?id=42")
  → Produces a signature that proves the request is authentic

HMAC is used extensively in API authentication (AWS Signature V4, Stripe webhooks), JWT tokens, cookie signing, and message integrity verification. It prevents tampering because an attacker cannot produce a valid HMAC without knowing the secret key.

Real-World Applications

Hash functions are foundational to modern computing. Here's where they're used every day:

Blockchain & Cryptocurrency: Bitcoin uses double SHA-256 for its proof-of-work mining, transaction IDs, and block linking. Each block contains the hash of the previous block, creating an immutable chain. Ethereum uses Keccak-256 (a SHA-3 variant).

TLS/SSL (HTTPS): Every HTTPS connection uses hash functions for certificate verification, key derivation, and message integrity. The handshake uses SHA-256 to ensure nothing was tampered with during the exchange.

Git Version Control: Every object in Git — commits, trees, blobs, and tags — is identified by its SHA-1 hash. This creates a content-addressable storage system where the address of every piece of data is its own hash.

Software Distribution:When you download Linux, Python, or any major software, the download page provides SHA-256 checksums. After downloading, you hash the file and compare — if they match, the file hasn't been corrupted or tampered with.

Digital Signatures:RSA, ECDSA, and EdDSA all hash the message before signing. You don't sign the entire document — you sign its hash, which is much smaller and faster while maintaining the same security guarantees.

Deduplication: Cloud storage services, backup systems, and content delivery networks use hashes to identify duplicate content. Two files with the same hash are (almost certainly) identical — no need to store or transmit the data twice.

Hash Tables: Beyond cryptography, hash functions power the hash tables (dictionaries/maps) in virtually every programming language. Python's dict, JavaScript's Object, and Java's HashMap all rely on hash functions for O(1) lookups.

Hash vs Encryption vs Encoding

These three concepts are frequently confused but serve completely different purposes:

FeatureHashingEncryptionEncoding
DirectionOne-way (irreversible)Two-way (reversible with key)Two-way (reversible, no key)
PurposeIntegrity & verificationConfidentialityData format conversion
Key required?No (HMAC uses a key)Yes (symmetric or asymmetric)No
Output sizeFixed (e.g., always 256 bits)Similar to input sizeSimilar to input size
ExamplesSHA-256, MD5, bcryptAES, RSA, ChaCha20Base64, URL encoding, UTF-8

Common Mistake

Base64 is not encryption. It's encoding — anyone can decode it without a key. Similarly, hashing is not encryption— you cannot "decrypt" a hash to recover the original data. These are fundamentally different operations with different security properties.

SHA-3 and the Future

SHA-3(Keccak) was selected in 2012 through a public competition run by NIST — the same process that chose AES. It uses a completely different internal structure from SHA-2 (a "sponge construction" vs the Merkle–Damgård construction), providing algorithm diversity.

SHA-3 was designed as a backup, not a replacement for SHA-2. Both families are secure, and SHA-2 remains the recommended choice for most applications due to wider support and hardware acceleration. SHA-3 is used where algorithm diversity is valued (e.g., Ethereum uses Keccak-256, a variant of SHA-3).

Quantum Computing & Hashing

Grover's algorithm (for quantum computers) effectively halves the security level of hash functions — SHA-256 would have 128-bit collision resistance instead of 128 (it's already 128 due to the birthday bound). This means SHA-256 remains secure even against quantum computers. SHA-512 would retain 256-bit collision resistance — far beyond any foreseeable computing capability.

Best Practices

  • Use SHA-256 or stronger: For any security-related purpose, SHA-256 is the minimum. Avoid MD5 and SHA-1 for anything where collision resistance matters.
  • Use specialized password hashing: Never use SHA-256 directly for passwords. Use Argon2 (preferred), bcrypt, or scrypt with a unique random salt per password.
  • Always salt passwords: A salt is a random value unique to each user, stored alongside the hash. It prevents rainbow table attacks and ensures identical passwords produce different hashes.
  • Verify file downloads: Always check the SHA-256 hash of downloaded software against the publisher's provided checksum before installing.
  • Use HMAC for authentication: When you need to verify both integrity and authenticity (e.g., API requests, webhooks), use HMAC-SHA256, not a plain hash.
  • Don't roll your own crypto: Use well-tested libraries and standard algorithms. The Web Crypto API, OpenSSL, and language-specific libraries (hashlib in Python, crypto in Node.js) are thoroughly vetted.
  • Plan for algorithm migration: Store which algorithm was used alongside the hash. This allows you to upgrade to stronger algorithms in the future without breaking existing data.

Generate Hash Values Instantly

Use our free Hash Generator to create MD5, SHA-1, SHA-256, SHA-384, and SHA-512 hash values for any text. Real-time generation, one-click copy, and complete privacy — all processing happens in your browser.

Try Hash Generator →

References

  1. NIST. FIPS 180-4: Secure Hash Standard (SHS). https://csrc.nist.gov/publications/detail/fips/180/4/final
  2. NIST. FIPS 202: SHA-3 Standard: Permutation-Based Hash and Extendable-Output Functions. https://csrc.nist.gov/publications/detail/fips/202/final
  3. Stevens, M. et al. (2017). The first collision for full SHA-1. Crypto 2017. https://shattered.io/
  4. Rivest, R. (1992). RFC 1321: The MD5 Message-Digest Algorithm. https://datatracker.ietf.org/doc/html/rfc1321
  5. Krawczyk, H. et al. (1997). RFC 2104: HMAC: Keyed-Hashing for Message Authentication. https://datatracker.ietf.org/doc/html/rfc2104
  6. Biryukov, A. et al. (2015). Argon2: the memory-hard function for password hashing and other applications. Password Hashing Competition.