We have already mentioned hashing in some posts, for example, in this post. We want to talk more about hashing now. Hashing data is a common practice in computer science and is used for several different purposes. Hashing is the practice of taking a string or input key, a variable created for storing narrative data, and representing it with a hash value, which is typically determined by an algorithm and constitutes a much shorter string than the original. Hashing is also used in many encryption algorithms because it masks the original data with another value. A hash function can be used to generate a value that can only be decoded by looking up the value from a hash table. Additionally, Hashing is a method of sorting key values in a database table in an efficient manner. In fact, we can say that hashing is an algorithm that calculates a fixed-size bit string value from a file. A file basically contains blocks of data. Hashing transforms this data into a far shorter fixed-length value or key which represents the original string.

Hashing is not compression. It’s a different animal, but it can operate very much like file compression in that it takes a larger data set and shrinks it into a more manageable form. A good hashing algorithm would exhibit a property called the avalanche effect, where the resulting hash output would change significantly or entirely even when a single bit or byte of data within a file is changed. A hash function that does not do this is considered to have poor randomization, which would be easy to break by hackers. The hashing algorithm is called the hash function.

A hash is often a hexadecimal string of several characters. Hashing is also a unidirectional process so you can never work backwards to get back the original data. One main use of hashing is to compare two files for equality. Without opening two document files to compare them word-for-word, the calculated hash values of these files will allow the owner to know immediately if they are different.

Source: 2brightsparks.com



Applications of Hashing

Hashing provides constant time search, insert and delete operations on average. This is why hashing is one of the most used data structure. There are many other applications of hashing, including modern day cryptography hash functions. Some of these applications are listed below:

Message Digest: This is an application of cryptographic Hash Functions. Cryptographic hash functions are the functions which produce an output from which reaching the input is close to impossible. This property of hash functions is called irreversibility.

Password Verification: Cryptographic hash functions are very commonly used in password verification.

Data Structures (Programming Languages): Various programming languages have hash table based Data Structures. The basic idea is to create a key-value pair where key is supposed to be a unique value, whereas value can be same for different keys.

Compiler Operation: To differentiate between the keywords of a programming language (if, else, for, return etc.) and other identifiers and to successfully compile the program, the compiler stores all these keywords in a set which is implemented using a hash table.

Rabin-Karp Algorithm: One of the most famous applications of hashing is the Rabin-Karp algorithm. This is basically a string-searching algorithm which uses hashing to find any one set of patterns in a string. A practical application of this algorithm is detecting plagiarism.

Linking File name and path together: When moving through files on our local system, we observe two very crucial components of a file i.e. file_name and file_path. In order to store the correspondence between file_name and file_path the system uses a map (file_name, file_path) which is implemented using a hash table.

This section is abbreviated from geeksforgeeks.org



Hashing Algorithms 

A hashing algorithm is a cryptographic hash function. It is a mathematical algorithm that maps data of arbitrary size to a hash of a fixed size. It’s designed to be a one-way function, infeasible to invert. However, in recent years several hashing algorithms have been compromised. This happened to MD5, for example.

Anyway, there are many different types of hash algorithms such as RipeMD, Tiger, xxhash and more, but the most common type of hashing used for file integrity checks are MD5, SHA-2 and CRC32.

  • MD5: An MD5 hash function encodes a string of information and encodes it into a 128-bit fingerprint. MD5 is the Message Digest algorithm 5. In fact, MD5 is often used as a checksum to verify data integrity. MD5 is one of the most widely used algorithms in the world. MD5 initially designed to be used as a cryptographic algorithm function. MD5 it is very easy to generate a message digest of the original message using this algorithm.
  • SHA-2: Secure hash algorithm (SHA) is a series of the hashing algorithm. This algorithm developed by the National Security Agency (NSA), is a cryptographic hash function. SHA-2 includes significant changes from its predecessor, SHA-1. The SHA-2 family consists of six hash functions with digests (hash values) that are 224, 256, 384 or 512 bits: SHA-224, SHA-256, SHA-384, SHA-512, SHA-512/224, and SHA-512/256. The SHA-2 hash function is implemented in some widely used security applications and protocols, including TLS and SSL, PGP, SSH, S/MIME, and IPsec.
  • SHA-3: Secure Hash Algorithm 3 designed by Guido Bertoni, Joan Daemen, Michaël Peeters and Gilles Van Assche.One of SHA-3’s requirements was to be resilient to potential attacks that could compromise SHA-2. Today, SHA-3 is the safest hashing algorithm. Of course, SHA-3’s authors have proposed additional features like an authenticated encryption system and a tree hashing scheme, but they aren’t standardized yet.
  • CRC32: A cyclic redundancy check (CRC) is an error-detecting code often used for detection of accidental changes to data. These days, CRC32 is rarely used outside of Zip files.