Hashing

Hashing for faster retrieving @relational

Hashing is a method by which the query search value itself after processing maps to address which containing data blocks or records; Init the processing is done by hash function, the output yield after the function is index of the bucket which contains the address to the bucket which is located in the disk and stores the records or data blocks. Resulting in faster data retrieval. Hash function - process the value with code and yields index of the bucket Bucket - From bucket index the bucket address is taken (physical location of bucket) from which the records or data blocks are fetched. Its like array at index 1 containing bucket 1

Types of hashing

Static Hashing

In static hashing, the number of buckets is fixed, and each key always maps to the same bucket index through the hash function. This makes the process simple and efficient for stable datasets but creates challenges when the volume of data increases. If too many keys are mapped to the same bucket, collisions occur, leading to performance issues.

To handle these collisions, two techniques are commonly used. The first is chaining, or separate chaining, where each bucket is implemented as a linked list so that multiple records sharing the same hash index can be linked together. The second method is open addressing, also called closed hashing, in which the system searches for the next available slot (through probing) when a bucket is already full. While static hashing is straightforward, its major limitation is the inability to adapt efficiently as the dataset grows.

Dynamic Hashing

Dynamic hashing was developed to overcome the rigid nature of static hashing by allowing the number of buckets to expand as needed. This prevents the degradation in performance caused by overflowed buckets and collisions when the dataset grows. Two major approaches in dynamic hashing are extendible hashing and linear hashing, both of which provide flexible ways to manage bucket growth.

Extendible Hashing

In extendible hashing, the hash value is represented in binary, and the first d bits of the hash value, known as the global depth, are used to index into a directory stored in memory. Each entry in this directory points to a bucket on disk where the records are stored. If a bucket becomes full, it is split into two, and its local depth (the number of bits used for that specific bucket) is increased. When the local depth exceeds the global depth, the directory itself is doubled in size, which means one additional bit from the hash value is used for indexing. This method allows the structure to expand in a controlled and flexible way, without needing to rehash all existing records.

Linear Hashing

Linear hashing provides a more gradual approach to growth compared to extendible hashing. Instead of doubling directories, buckets are split in a fixed linear sequence. When a bucket overflows, only that specific bucket is split into two, and the records inside it are redistributed using a new version of the hash function. Over time, as more buckets overflow, the system continues splitting them one by one in order, ensuring that growth happens smoothly and incrementally. This avoids sudden jumps in size and spreads the cost of rehashing evenly across operations.

Akash's PKM

Explorer