#DataStructure#Computer-Science#CS251 Hash tables map data into an array for faster accessing. This is done via hashing the elements with a hash function to determine their index. Search , A[h(k)], key == k Read , A[h(k)], key Delete A[h(k)] = null

Hash tables are fast but they definitely aren’t the end all be all. They have their problems

Firstly, they require assumptions about the data.

  1. The keys are all unique - this is the optimal situation
  2. We replace old values with new values if we use the same key

Hash Table Functions

  • Insert(Key, item data) - insert key with item data
  • Get(Key) - get item with key
  • Delete(Key) - delete key
  • Contains(Key) - does hash table contain key
  • isEmpty() - is hash table empty
  • Size() - size of hash table
  • Keys() - returns array of just the keys

Hash Function

h is the hash function

h is defined as There are two steps to hashing,

  1. prehashing: first we have to transform the key into an integer number
  2. map the integer number to an index in

Two Common Hash Functions

We ignore prehashing, we assume that it has been done already.

Division Method

Fairly simple strategy, but, m should be a prime number.

MANY WARNINGS!

  1. If m is even, then h(k) is even/odd whenever k is even/odd
    1. e.g., say m=8. h(k) = even, so k must also be even in this scenario
  2. if then h(k) copies the least p-bits of k
    1. say the key in binary is 0101, and m = 2 (2^1)
    2. hash result copies the least 1 bit from the key, so h(k) = 1 (highlighted bits are the copied bits)
  3. if then h(k) copies the least p digits of k
    1. key is 1234567890, m= 100 ()
    2. h(k) = 90, least 2 digits

Multiplication Method

is a number between 0 and 1 A common value for is , which is close the golden ratio

Issue: Collisions

What happens when two keys are hashed to the same index? This is called a collision

We have 2 solutions to collisions.

Chaining

When elements collide, chain them together into a linked list

Insertion is still

Searching however changes

  • The absolute worst case is
  • We look at the load factor to get a true time complexity
    • is more accurate
  • We want to keep , if , resize the table and re hash every element

Open Addressing

Find another available slot for the element when two elements collide

  • The load factor will always be
  • There are 3 ways to find an open index
    • Linear Probing -
    • Quadratic Probing -
    • Double Hashing -

WARNING With open addressing, we NEVER want to fully delete things. Instead, replace deleted items with a dummy object.

Analysis

  • An unsuccessful search
  • A successful search We want to keep if resize and rehash