#DataStructure#Computer-Science#CS251
Hash tables map data into an array for faster accessing. This is done via hashing the elements with a hash function to determine their index.
 Search , A[h(k)], key == k
Read , A[h(k)], key
Delete  A[h(k)] = null
Search , A[h(k)], key == k
Read , A[h(k)], key
Delete  A[h(k)] = null
Hash tables are fast but they definitely aren’t the end all be all. They have their problems
Firstly, they require assumptions about the data.
- The keys are all unique - this is the optimal situation
- We replace old values with new values if we use the same key
Hash Table Functions
- Insert(Key, item data) - insert key with item data
- Get(Key) - get item with key
- Delete(Key) - delete key
- Contains(Key) - does hash table contain key
- isEmpty() - is hash table empty
- Size() - size of hash table
- Keys() - returns array of just the keys
Hash Function
h is the hash function
h is defined as There are two steps to hashing,
- prehashing: first we have to transform the key into an integer number
- map the integer number to an index in
Two Common Hash Functions
We ignore prehashing, we assume that it has been done already.
Division Method
Fairly simple strategy, but, m should be a prime number.
MANY WARNINGS!
- If m is even, then h(k) is even/odd whenever k is even/odd
- e.g., say m=8. h(k) = even, so k must also be even in this scenario
 
- if  then h(k) copies the least p-bits of k
- say the key in binary is 0101, and m = 2 (2^1)
- hash result copies the least 1 bit from the key, so h(k) = 1 (highlighted bits are the copied bits)
 
- if  then h(k) copies the least p digits of k
- key is 1234567890, m= 100 ()
- h(k) = 90, least 2 digits
 
Multiplication Method
is a number between 0 and 1 A common value for is , which is close the golden ratio
Issue: Collisions
What happens when two keys are hashed to the same index? This is called a collision
We have 2 solutions to collisions.
Chaining
When elements collide, chain them together into a linked list
Insertion is still
Searching however changes
- The absolute worst case is
- We look at the load factor to get a true time complexity
- is more accurate
 
- We want to keep , if , resize the table and re hash every element
 
Open Addressing
Find another available slot for the element when two elements collide
- The load factor will always be
- There are 3 ways to find an open index
- Linear Probing -
- Quadratic Probing -
- Double Hashing -
 
WARNING With open addressing, we NEVER want to fully delete things. Instead, replace deleted items with a dummy object.
Analysis
- An unsuccessful search
- A successful search We want to keep if resize and rehash