data sturcture notes: hasing and hash tables

HASHING

Hashing is a technique used for storing and retrieving keys in a rapid manner. In hashing, a string of characters are transformed into a usually shorter length value or key that represents the original string. Hashing is used to index and retrieve items in a database because it is faster to find item using shorter hashed key than to find it using the original value. Hashing can also be defined as a concept of distributing keys in an array called a hash table using predetermined function called hash function.

Hash Table

Hash table is a table (array) where we store the original string. Index of the table is the hashed key while the value is the original string. The size of hash table is usually several orders of magnitude lower than the total number of possible string, so several string might have a same hashed-key.

Hash Function

There are many ways to hash a string into a key. The following are some of the important methods for constructing hash functions.

•Mid-square

•Division (most common)

•Folding

•Digit Extraction

•Rotating Hash

Mid Square

Square the string/identifier and then using an appropriate number of bits from the middle of the square to obtain the hash-key. If the key is a string, it is converted to a number.

Steps:

1. Square the value of the key. (k2)

2. Extract the middle r bits of the result obtained in Step 1

Function : h(k) = s

k = key

s = the hash key obtained by selecting r bits from k2

Division

Divide the string/identifier by using the modulus operator. It’s the most simple method of hashing an integer x.

Function: h(z) = z mod M

z = key

M = the value using to divide the key, usually using a prime number, the table size or the size of memory used.

Hash Function : Folding

The Folding method works in two steps : Divide the key value into a number of parts where each part has the same number of digits except the last part which may have lesser digits than the other parts. Add the individual parts. That is obtain the sum of part1 + part2 + ... + part n. The hash value produced by ignoring the last carry, if any.

Rotating Hash

Use any hash method (such as division or mid-square method). After getting the hash code/address from the hash method, do rotation. Rotation is performed by shifting the digits to get a new hash address.

Collision

What happened if we want to store these strings using the previous hash function (use the first character of each string). Define, float, exp, char, atan, ceil, acos, floor. There are several strings which have the same hash-key, it’s float and floor (hash-key: 5), char and ceil (hash-key: 2). There are two general ways to handle collisions:

Linear Probing : Search the next empty slot and put the string there.

Chaining :Put the string in a slot as a chained list (linked list).

Linear Probling

Linear probing has a bad search complexity if there are many collisions. The table “step” on the right describe how many loop/step needed to find the string. Supposed we want to find ceil, we compute the hash key and found 2. But ceil is not there so we should iterate until we found ceil.

void linear_probing(item, h[]) {

hash_key = hash(item);

i = has_key;

while ( strlen(h[i] != 0 ) {

if ( strcmp(h[i], item) == 0 ) {

// DUPLICATE ENTRY
}

i = (i + 1) % TABLE_SIZE;

if ( i == hash_key ) {

// TABLE IS FULL
}
}

h[i] = item;

}

Chaining

In chaining, we store each string in a chain (linked list). So if there is collision, we only need to iterate on that chain.

void chaining(item, h[]) {

hash_key = hash(item);

trail = NULL, lead = h[hash_key];

while ( lead ) {

if ( strcmp(lead->item, item) == 0 ) { // DUPLICATE ENTRY }

trail = lead; lead = lead->next;
}

p = malloc new node;

p->item = item;

p->next = NULL;

if ( trail == 0 ) h[hash_key] = p; else trail->next = p;

}

data sturcture notes

Sunday, March 15, 2020

hasing and hash tables

HASHING

No comments:

Post a Comment