ScholarQuill logoScholarQuillUniversity Notes
  • Notes
  • Past Papers
  • Blogs
  • Todo
Login
ScholarQuill logoScholarQuillUniversity Notes
Login
NotesPast PapersBlogsTodo
More
SubjectsDiscussionCGPA CalculatorGPA CalculatorStudent PortalCourse Outline
About
About usPrivacy PolicyReportContact
Notes
Past Papers
Blogs
Todo
Analytics
    Current Subject
    🧩
    Design and Analysis of Algorithms
    COMP3138
    Progress0 / 53 topics
    Topics
    1. Introduction to Algorithm Design2. Data Structures3. Efficiency in Algorithms4. Analysis of Algorithms5. Mathematical Review6. Mathematical Analysis of Algorithms7. Types of Functions8. Order of Growth9. Asymptotic Notations10. Sorting Algorithms11. Selection Sort Algorithm12. Example and Analysis of Selection Sort13. Insertion Sort Algorithm14. Divide and Conquer Algorithms15. Merge Sort Algorithm16. Quick Sort Algorithm17. Bucket Sort Algorithm18. Radix Sort Algorithm19. Counting Sort Algorithm20. Heap Sort Basics21. Heap Algorithms22. Heap Properties and Examples23. Heap Operations24. Heap Sort Algorithm Analysis25. Heap Insertion and Deletion26. Tree Based Algorithms and Hashing27. Red Black Tree Basics28. Binary Search Tree Basics29. Tree Searching Algorithms30. Analysis of Tree Searching Algorithms31. Analysis of Insertion and Deletion in BST32. Hashing Basics33. Examples of Hash Functions34. Analysis of Collision Resolution Techniques35. Dynamic Programming36. 0-1 Knapsack Problem37. Fractional Knapsack Problem38. Longest Common Subsequence39. Shortest Path Finding40. Matrix Chain Multiplication41. Assembly Line Chain Problem42. Greedy Algorithms43. Prim's Algorithm44. Kruskal's Algorithm45. Dijkstra's Algorithm46. Huffman Coding47. NP-Completeness48. Polynomial Time Verification49. Reducibility50. NP-Completeness Proofs51. Randomized Algorithms52. Particle Swarm Optimization53. Genetic Algorithms
    COMP3138›Hashing Basics
    Design and Analysis of AlgorithmsTopic 32 of 53

    Hashing Basics

    8 minread
    1,338words
    Intermediatelevel

    Hashing Basics

    Hashing is a powerful technique used to store and retrieve data efficiently, and it forms the foundation of hash tables and hash maps. The idea is to map data to a fixed-size value (called a hash code) using a hash function. This hash value allows for fast data access, providing constant time complexity for many operations like search, insertion, and deletion.

    Here’s an overview of the key concepts of hashing, including the components involved and the common methods used.


    1. Hash Function

    A hash function is an algorithm that takes an input (often called the key) and maps it to a fixed-size value, usually an integer, called a hash code. The goal of a hash function is to produce a uniform distribution of hash values for different inputs, minimizing collisions (when two keys produce the same hash value).

    Properties of a Good Hash Function:

    • Deterministic: The same input always produces the same hash value.
    • Uniformity: The function should distribute hash values uniformly across the hash table to reduce clustering.
    • Efficient: The hash function should be computationally efficient, ideally with constant time complexity O(1).
    • Minimize Collisions: Collisions should be minimized, where two different keys produce the same hash value.

    Example of a Simple Hash Function:

    For an integer key, a simple hash function might be the modulo operation:

    hash(key) = key % table_size
    

    For a string key, a more sophisticated approach is often used, such as:

    hash(key) = sum of ASCII values of characters % table_size
    

    This is a very basic example. In practice, hash functions are more complex to avoid collisions and ensure efficient distribution.


    2. Hash Table

    A hash table (or hash map) is a data structure that uses a hash function to map keys to values. The key is hashed to an index in an array (called a bucket or slot), and the value is stored at that index.

    Structure of a Hash Table:

    • Array: A fixed-size array that holds values at specific indices, determined by the hash code.
    • Buckets/Slots: The array is divided into buckets or slots, where the data is stored. Each slot holds the value associated with a key.
    • Hash Function: The function that converts the key into an index within the array. The hash function determines where in the array the key-value pair will be placed.

    Insertion in a Hash Table:

    1. Compute the hash of the key using the hash function.
    2. Map the hash value to an index in the array (bucket).
    3. Insert the key-value pair at the corresponding index in the array.

    Example of a Hash Table:

    Suppose you want to store the following pairs in a hash table:

    • Key: "apple", Value: 5
    • Key: "banana", Value: 10
    • Key: "cherry", Value: 15

    Let's assume a hash table with 5 slots and a hash function that returns the length of the key modulo 5. The hash function would look like this:

    hash(key) = len(key) % 5
    

    For each key:

    • "apple" → hash("apple") = 5 % 5 = 0
    • "banana" → hash("banana") = 6 % 5 = 1
    • "cherry" → hash("cherry") = 6 % 5 = 1

    In this case, "banana" and "cherry" will both collide and be placed in the same slot (bucket 1).


    3. Collisions and Collision Resolution

    A collision occurs when two keys produce the same hash value and thus are mapped to the same bucket. Collisions are inevitable because the number of possible keys is generally much larger than the number of available buckets.

    Common Collision Resolution Techniques:

    1. Chaining (Separate Chaining):

      • Each bucket in the hash table stores a linked list (or another collection) of all elements that hash to the same index.
      • When a collision occurs, the new key-value pair is added to the linked list at the corresponding bucket.

      Example:

      • If both "banana" and "cherry" hash to bucket 1, the bucket 1 will hold a linked list of key-value pairs:
        bucket[1] → ("banana", 10) → ("cherry", 15)
        
    2. Open Addressing:

      • When a collision occurs, the algorithm probes other buckets in the hash table to find an empty spot.
      • There are several probing strategies:
        • Linear Probing: If a collision occurs, check the next bucket. If that’s occupied, check the one after, and so on, wrapping around to the beginning of the array if necessary.
        • Quadratic Probing: Instead of checking the next bucket, check buckets at intervals of increasing size (1, 4, 9, etc.).
        • Double Hashing: Use a second hash function to calculate a new step size for probing if a collision occurs.

      Example (Linear Probing):

      • If bucket 1 is already occupied, the hash table checks bucket 2, then bucket 3, and so on, until an empty bucket is found.

    4. Load Factor and Resizing

    The load factor of a hash table is a measure of how full the table is. It is defined as:

    load factor = (number of elements in the table) / (size of the table)
    

    A high load factor (i.e., a lot of keys in relatively few buckets) increases the likelihood of collisions, which can degrade performance.

    Resizing:

    When the load factor exceeds a certain threshold (often around 0.7 or 70%), the hash table may need to be resized (doubled in size, for example) to maintain efficient operations. Resizing involves:

    • Allocating a larger array.
    • Rehashing all existing keys and inserting them into the new array according to their new hash values.

    Resizing and rehashing ensure that the performance of the hash table remains efficient, with operations staying close to O(1) on average.


    5. Time Complexity of Hash Table Operations

    • Search: O(1) on average, assuming a good hash function and low collision rates. In the worst case (with many collisions), the time complexity can degrade to O(n) (e.g., if using chaining and all keys hash to the same bucket).

    • Insert: O(1) on average, as inserting into the correct bucket is typically a constant-time operation. However, when resizing occurs, the insertion time may temporarily increase due to rehashing.

    • Delete: O(1) on average, similar to search. Deleting an element requires locating it using the hash value and then removing it.

    • Resize/rehash: O(n), where n is the number of elements in the table. Rehashing requires moving all existing elements to the new table.


    6. Advantages and Disadvantages of Hashing

    Advantages:

    • Fast Lookups: Searching, inserting, and deleting elements in a hash table can be done in constant time O(1) on average.
    • Efficient Use of Space: Hash tables provide a good balance of time efficiency and space usage, especially when the hash function distributes keys uniformly across the buckets.

    Disadvantages:

    • Collisions: Collisions are inevitable, and while they can be handled efficiently, they add some overhead to the operations.
    • Resizing: Hash tables may need resizing, which can temporarily degrade performance.
    • Non-Order: Hash tables do not maintain any specific order of elements, unlike data structures like binary search trees.
    • Memory Usage: Hash tables require extra memory to maintain an array and handle collisions (e.g., linked lists or additional probing).

    Applications of Hashing

    • Hash Maps: Used in many programming languages (e.g., dict in Python, Map in Java) for fast lookups.
    • Caches: Hashing is used in cache systems to store recently accessed data, allowing for quick retrieval.
    • Databases: Hashing is used for indexing in databases to speed up query execution.
    • Cryptography: Cryptographic hash functions (e.g., SHA-256) are used to create secure hash values for storing passwords or verifying data integrity.
    • Unique Identifiers: Hashing can generate unique identifiers for records in large datasets.

    Conclusion

    Hashing is a powerful technique for quickly storing and retrieving data. By using a hash function to map keys to indices in a hash table, it offers constant-time average performance for common operations like insertion, deletion, and search. The performance depends on factors like the hash function, collision resolution strategy, and the load factor, but when used appropriately, hashing provides a highly efficient data storage solution.

    Previous topic 31
    Analysis of Insertion and Deletion in BST
    Next topic 33
    Examples of Hash Functions

    Past Papers

    Open this section to load past papers

    Click on Show Past Papers to see past papers.
    On This Page
      Reading Stats
      Est. reading time8 min
      Word count1,338
      Code examples0
      DifficultyIntermediate