Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Data Structures Hash Map Assignment, Assignments of Data Structures and Algorithms

Hash map implementation and exercises.

Typology: Assignments

2020/2021

Uploaded on 01/09/2021

volkan-ulker
volkan-ulker 🇹🇷

5

(1)

1 document

1 / 5

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CME 2201 - Assignment 1
In this assignment, you are expected to index words of a document named as ‘story.txt’. You
must read this file, split it word by word, and index each word to your hash table according to
rules given below.
Requirements
Usage of Java programming language and generic data types are required.
You need to implement base functions of a classical Hash Table by yourself (do not
extend an available Java Hash Map class directly).
Object Oriented Programming (OOP) principles must be applied.
Exception handling must be used when it is needed.
1. Main Functionalities
put(Key k, Value v)
Read the given input story.txt file, calculate the number of occurrences of each word as count
value, insert this count data into the hash table accordingly.
Value get(Key k)
Search the given word (k) in the hash table. If the word is available in the table, then return an
output as shown below, otherwise return a “not found” message to the user. When a word is
searched in the hash table; key, count, and index of the word should be printed.
------ Output ------
Search: Ezgi
Key: 1243225
Count : 10
Index : 165
Search: Ali
Key: 68294842
Count : 3
Index : 132
Note: Results of the words of “Ezgi” and “Ali” were generated as an example. So, you will
not obtain the same results by using the ‘story.txt’ file.
resize(int capacity)
Make the hash table dynamically growable. The put method should double the current table
size if the hash table reaches the maximum load factor. You should take the initial size of the
table as 997 and call the resize method according to two different load factor values (50% and
70%).
pf3
pf4
pf5

Partial preview of the text

Download Data Structures Hash Map Assignment and more Assignments Data Structures and Algorithms in PDF only on Docsity!

CME 2201 - Assignment 1

In this assignment, you are expected to index words of a document named as ‘story.txt’. You must read this file, split it word by word, and index each word to your hash table according to rules given below.

Requirements

 Usage of Java programming language and generic data types are required.  You need to implement base functions of a classical Hash Table by yourself (do not extend an available Java Hash Map class directly).  Object Oriented Programming (OOP) principles must be applied.  Exception handling must be used when it is needed.

1. Main Functionalities

put(Key k, Value v)

Read the given input story.txt file, calculate the number of occurrences of each word as count value, insert this count data into the hash table accordingly.

Value get(Key k)

Search the given word (k) in the hash table. If the word is available in the table, then return an output as shown below, otherwise return a “not found” message to the user. When a word is searched in the hash table; key, count, and index of the word should be printed.

------ Output ------

Search: Ezgi Key: 1243225 Count : 10 Index : 165

Search: Ali Key: 68294842 Count : 3 Index : 132

Note: Results of the words of “Ezgi” and “Ali” were generated as an example. So, you will not obtain the same results by using the ‘story.txt’ file.

 resize(int capacity)

Make the hash table dynamically growable. The put method should double the current table size if the hash table reaches the maximum load factor. You should take the initial size of the table as 997 and call the resize method according to two different load factor values (50% and 70%).

2. Hash Function

To specify an index corresponding to given string key, firstly you should generate an integer hash code by using a special function. Then, resulting hash code has to be converted to the range 0 to N-1 using a compression function, such as modulus operator (N is the size of hash table).

You are expected to implement two different hash functions including polynomial accumulation function and your own hash function.

Polynomial Accumulation Function (PAF)

The hash code of a string s can also be generated by using the following polynomial:

where is the left most character of the string, characters are represented as numbers in 1- 26 (case insensitive), and n is the length of the string. The constant z is usually a prime number (33, 37, 39, and 41 are particularly good choices for English words). When the z value is chosen as 33, the string "car" has the following hash value:

Note: Using of this calculation on the long strings will result in numbers that will cause overflow. You should ignore overflows or use Horner's rule to perform the calculation and apply the modulus operator after computing each expression in Horner's rule.

Your Own Hash Function (YHF)

Hash code for converting each word to an integer key must be implemented by yourself. The input value will be the word and the integer key will be returned by your hash code function. The hash (compression) function for converting a key to the index (address calculator) must be implemented by yourself.

3. Collision Handling Approach

You are expected to implement a collision resolution technique based on open addressing. The insertion algorithm is as follows:

 Calculate the hash value and initial index of the entry to be inserted.  Then search the position linearly.  While searching, the distance from initial index is kept which is called DIB (Distance from Initial Bucket).  If we can find the empty bucket, we can insert the new entry with the DIB value in here.  If we encounter an entry which has less DIB than the candidate entry, swap them.

Step 3: Final state after displacements and insertion

For entry retrieval, entries can be found using linear probing starting from their initial indexes, until they are encountered, or until an empty bucket is found, in which case it can be concluded that the entry is not in the table. The search can also be stopped if during the linear probing of a bucket is encountered for which the distance to the initial bucket is smaller than the DIB of the entry it contains.

4. Performance Monitoring

You are expected to fill the performance matrix (Table 1) by running your code under different conditions including two different load factors (50% and 70%) to decide resizing of hash table and two different hash functions (PAF and YHF).

You should count total number of collision occurrences and measure expended time while indexing words in the "story.txt" under each condition. In addition, you should calculate minimum, maximum and average search times by using the “search.txt” file that contains 100 words to search for (search time means the time expended to find a particular key in the hash table. It does not include the time spent for outputs. To calculate avg. search time, divide the total expended time to the total number of searched keys). You can use System.nanoTime() or System.currentTimeMillis() for time operations.

Load Factor

Hash Function

Collision Count

Indexing Time

Avg. Search Time

Min. Search Time

Max. Search Time

α=50%

PAF

YHF

α=70%

PAF

YHF

Table 1. Performance matrix

Provided Resources

 Document to index: story.txt  Word list to use in calculation of searching times: search.txt

Due date

December 16, 2020, 23:

Submission

You must upload your all ‘.java’ files as an archive file (.zip or .rar) to the Sakai platform. Your archived file should be named as ‘studentnumber_name_surname.rar.zip’, e.g., 2007510011_Ali_Yılmaz.rar.

Prepare and upload a report with descriptions of your data structure, java code, and performance matrix.

Plagiarism Control

The submissions will be checked for code similarity. Copy assignments will be graded as zero, and they will be announced in the Sakai.

Grading Policy

Job Percentage Usage of Generic, OOP and Try-Catch % 20 Implementation of hash operations and collision handling approach

Performance monitoring %