Hash Tables

Hash tables, also known as hash maps or dictionaries in some programming languages, are data structures that store key-value pairs. They use a technique called hashing to efficiently retrieve and store data based on keys.

Here’s a basic overview of how hash tables work:

Hash Function: A hash function is used to convert keys into array indices. It takes a key as input and computes a hash code, which is typically an integer. This hash code is then mapped to an index in the hash table’s underlying array.
Collision Handling: Since multiple keys can hash to the same index (a collision), hash tables employ collision resolution techniques to handle this situation. Common methods include chaining (using linked lists or other data structures to store multiple values at the same index) or open addressing (probing for an empty slot nearby).
Insertion and Retrieval: To insert a key-value pair into the hash table, the key is hashed to find the appropriate index, and the value is stored at that index. When retrieving a value, the key is hashed to determine the index, and the corresponding value is retrieved from the hash table.

Here’s a simple example of a hash table implemented in Python using chaining for collision handling:

class HashTable:
    def __init__(self, size):
        self.size = size
        self.table = [[] for _ in range(size)]

    def _hash_function(self, key):
        return hash(key) % self.size

    def insert(self, key, value):
        index = self._hash_function(key)
        self.table[index].append((key, value))

    def get(self, key):
        index = self._hash_function(key)
        bucket = self.table[index]
        for k, v in bucket:
            if k == key:
                return v
        raise KeyError("Key not found")

# Example usage:
ht = HashTable(10)
ht.insert("apple", 5)
ht.insert("banana", 7)

print(ht.get("apple"))  # Output: 5
print(ht.get("banana"))  # Output: 7

In this code:

The HashTable class initializes a hash table with a given size.
The _hash_function method computes the hash code for a given key.
The insert method inserts a key-value pair into the hash table.
The get method retrieves the value associated with a given key from the hash table.

This implementation uses a list of lists (table) for chaining, where each bucket in the hash table contains a list of key-value pairs. When inserting or retrieving a key-value pair, the hash function is used to determine the appropriate bucket in the table. If there are collisions (i.e., multiple keys hashing to the same index), the key-value pairs are stored in the same bucket, forming a chain.

Time Complexity

Operation	Best Case	Average Case	Worst Case
Insertion	(O(1))	(O(1)) (amortized)	(O(n))
Retrieval	(O(1))	(O(1)) (amortized)	(O(n))
Space	(O(n))	(O(n))	(O(n))

Explanation:

Insertion and Retrieval Time Complexity: In the best and average cases, the time complexity for insertion and retrieval operations is (O(1)). This assumes that the hash function distributes keys uniformly across the hash table and that collision resolution using chaining results in constant-time operations. However, in the worst case, all keys hash to the same index, resulting in a chain of (n) elements at that index. In this scenario, insertion and retrieval operations would have a time complexity of (O(n)), as they would require traversing the entire chain.
Space Complexity: The space complexity for the hash table is (O(n)), where (n) is the number of key-value pairs inserted into the hash table. This is because the space required by the hash table is proportional to the number of elements it contains.

Posted on Jun 7, 2024 by bmcculley Posted in:

Data Structures