HASHING
Hashing is a technique used for storing and retrieving keys in a rapid manner. In hashing, a string of characters are transformed into a usually shorter length
value or key that represents the original string. Hashing
is used to index and retrieve items in a database because it is faster to find
item using shorter hashed key than to find it using the original value. Hashing
can also be defined as a concept of distributing keys in an array called a hash
table using predetermined function
called hash
function.
Hash Table
Hash
table is a table (array) where we store the original string. Index of the table
is the hashed key while the value is the original string. The
size of hash table is usually several orders of magnitude lower than the total
number of possible string, so several string might have a same hashed-key.
Hash Function
There
are many ways to hash a string into a key. The following are some of the
important methods for constructing hash functions.
•Mid-square
•Division (most common)
•Folding
•Digit Extraction
•Rotating Hash
Mid Square
Square the string/identifier and
then using an appropriate number of bits from the middle of the square to
obtain the hash-key. If the key is a string, it is
converted to a number.
Steps:
1. Square
the value of the key. (k2)
2. Extract the middle r
bits of the result obtained in Step 1
Function : h(k) = s
k = key
s = the hash key obtained by
selecting r bits from k2
Division
Divide the string/identifier by
using the modulus operator. It’s the most simple method of
hashing an integer x.
Function:
h(z) = z mod M
z = key
M = the value using to divide the key, usually using a prime number, the table
size or the size of memory used.
Hash Function : Folding
The Folding method works in two
steps : Divide the key value into a number
of parts where each part has the same number of digits except the last part
which may have lesser digits than the other parts. Add the individual parts. That is
obtain the sum of part1 + part2 + ... + part n. The hash value produced by
ignoring the last carry, if any.
Rotating Hash
Use
any hash method (such as division or mid-square method). After
getting the hash code/address from the hash method, do rotation. Rotation is performed by shifting
the digits to get a new hash address.
Collision
What
happened if we want to store these strings using the previous hash function
(use the first character of each string). Define,
float, exp, char,
atan, ceil, acos, floor. There
are several strings which have the same hash-key, it’s float and floor
(hash-key: 5), char and ceil (hash-key: 2). There are two general ways to
handle collisions:
Linear
Probing : Search
the next empty slot and put the string there.
Chaining :Put
the string in a slot as a chained list (linked list).
Linear Probling
Linear
probing has a bad
search complexity if there are many collisions. The
table “step” on the right
describe how many loop/step needed to find the string. Supposed
we want to find ceil,
we compute the hash key and found 2. But ceil is not there so we should iterate
until we found ceil.
void
linear_probing(item, h[]) {
hash_key = hash(item);
i = has_key;
while ( strlen(h[i] != 0 ) {
if ( strcmp(h[i], item) == 0 ) {
// DUPLICATE ENTRY
}
}
i = (i + 1) % TABLE_SIZE;
if ( i == hash_key ) {
// TABLE IS FULL
}
}
}
}
h[i] = item;
}
Chaining
In
chaining, we store each string
in a chain (linked list). So
if there is collision, we only need
to iterate on that chain.
void
chaining(item, h[]) {
hash_key = hash(item);
trail = NULL, lead = h[hash_key];
while ( lead ) {
if ( strcmp(lead->item, item) == 0 ) { // DUPLICATE ENTRY }
trail = lead; lead = lead->next;
}
}
p = malloc new node;
p->item = item;
p->next = NULL;
if ( trail == 0 ) h[hash_key] = p; else trail->next = p;
}
No comments:
Post a Comment