Navigation
A. Set of Strings
Complete the implementation of the type BSTStringSet
, which implements the
interface StringSet
. As the class name suggests, the intended implementation
uses a binary search tree. Don't use any of the Java Collection classes in your
BST implementation (you may still use a List
implementation to collect items for
the asList
method).
B. BSTStringSet Bounded Iterator
Modify BSTStringSet
to implement SortedStringSet
, providing an iterator that returns elements
within a specified range of values in ascending order. The iterator should produce
its M values in time O(M + h), where h is the height of the tree, as described in lecture.
We've provided the class BSTStringSetRangeTest
to help check that your solution works in these time bounds.
The tricky part of this assignment is producing an inorder tree iterator. We've provided an iterator for a full BST, which you can use as a basis for your solution.
C. ECHashStringSet
For this problem, you'll expand on your work from earlier by creating a
hashing-based implementation of the StringSet
interface.
To implement the interface, you'll be working almost completely
from scratch, with only minimal skeleton code that implements the interface so the code will compile.
This one will be a fair bit more difficult than the BSTStringSet
.
Create a class ECHashStringSet
(EC as in External Chaining) that implements the StringSet
interface
using an external chaining hash table as its core data structure. In
order to ensure that the .put
and .contains
operations are fast, you should
resize the hash table as soon as the load factor exceeds 5. For memory
efficiency, you should ensure that the load factor is never less than
0.2, except for empty lists, for which the load factor is allowed to
be zero. You do not need to worry about downsizing the set since we
don't have a remove operation. You also do not need to worry about asList
returning the keys in sorted order.
There is one annoying issue you'll enounter: If the hashCode()
is
negative, we need to remove the top bit (since negative array indices
are not allowed in Java). You can do this using the bit operations we
discussed earlier in class, e.g. s.hashCode() & 0x7fffffff) %
bucketCount
.
There's no restriction on what you are allowed to use, but you should
avoid using any library class that has "hash"
in the name, since that would defeat the whole point of this problem.
You are allowed to use the hashCode()
function for Strings.
For an extra challenge, don't use anything that requires using a Java library
class (except for the asList
method).
As references, you might find the following resources useful:
- Chapter 7 of DSIJ.
- The Resizing Array-Based Stack, which is part of Princeton's standard Java library.
- The Wikipedia article on separate chaining (another term for external chaining).
- If you are getting "unchecked or unsafe operations" errors, you might want to look at this Stack Overflow post on creating arrays of objects that expect a formal type parameter. There is no good way to avoid the warning, aside from suppressing it.
Your correctness tests from the previous part should work almost without
modification (you'll just need to change the type of object that is
instantiated). Make sure to test something that inserts a large number
of strings. One testing approach is to generate random strings, insert
them into both your ECHashStringSet
and a
TreeSet<String>
, then iterate through the entire
TreeSet and ensure that all of its members are also contained in your
ECHashStringSet and vice-versa.
One solution from a previous semester (which does use an import for handling the list of items that go in each bucket) is 66 lines including comments and whitespace.
If you're unsure how to get started, slide 4 from the Hashing lecture provides a visual of what our hash table will look like. Starting from this visual, try asking yourself:
- What data structure do I need to make my buckets? What about the list stored at each bucket?
- How do I know which bucket a given
String
should be put in? - Try implementing
put
andcontains
. Test that you are able to add to yourECHashStringSet
and check if it contains elements. - Now, aim to satisfy the efficiency requirement with resizing. When do we want to resize the array? How do I resize an array?
D. ECHashStringSet Timing
Run the provided timing tests InsertRandomSpeedTest
and InsertInOrderSpeedTest
for a range of speeds. Fill out hw6timing.txt
as directed. Note that hw6timing.txt
has the input sizes you should be passing to InsertRandomSpeedTest
and InsertInOrderSpeedTest.
If you're having timing issues, make sure your hash table is actually resizing.