A Set, like a List, is an interface for a collection of objects. Like its counterpart in mathematics, a Set can contain only one of any given object–meaning there are no two elements for which equals() is true and, in the case of HashSet, the values of hashCode() are equal. Also, unlike a List, the elements of a Set are not in any particular sequence.
Set inherits all its methods from Collection. Like List, it has several implementations–but the most popular are HashSet and TreeSet.
HashSet
HashSet is a sort of “random access” set. Each time you add an element, it drops it into one of a number of “buckets” selected by the element’s hashCode() value (hence “hash” set). Each bucket can hold many elements, but none of them equals() any other. If any element already in the bucket does match the new element according to the equals() method, the new element is discarded and the add() operation returns false.
You can store a null in a HashSet.
HashSet has the same three constructors that ArrayList does, plus one more seldom-used one for specifying a load factor; see the documentation for details. In particular, you can convert a List to a Set, or one type of Set to another, using a constructor. Examples:
Set<String> mySet = new HashSet(); // Most abstract
Set<String> mySet = new HashSet(200); // Initial capacity
HashSet<String> mySet = new HashSet(myListOfStrings);
The Importance of Being hashCode()
On the previous page, we described an object’s equals() and hashCode() methods as joined at the hip, and explained how to use your IDE to easily generate them as a pair.
In a List, the equals() method by itself determines the behavior of contains() and remove(). But in a HashSet, the hashCode() method is part of the plan, because its return value determines where to find or place an element. Let’s explain.
- When designing a class, you determine which fields will be a “key.” It might be a customer number, a ZIP code, the Latin species of an animal, or a combination of fields.
- You use the IDE to generate the equals() and hashCode() values for the class, using the fields in the key.
- Before adding an element of the class to a HashSet, you determine a value for each field of the key.
- Add the element. The value of hashCode() determines the bucket into which the element is dropped.
- Never, never, never, ever, ever, ever, change the value of any field in the element’s key ever again for all eternity!
- NEVER!!!!
That’s because when you changed the value, you changed the outcome of hashCode() and sent any search for the item to the wrong bucket.
Not that your object has disappeared from the Set. On the contrary, it’s still there; HashSet just no longer knows where to find it. That means that if you try to add another instance (or even the same instance) with the same key value, you’ll get away with it because HashSet is looking in the wrong place for the original.
It bears repeating: when we speak of storing an object, or element, in a Collection, you’re not actually storing the object itself; you’re copying the contents of the box naming the object–a pointer to that object– into another box, and storing the second box in the Collection. If you change anything within the object itself, then from whichever box you retrieve the pointer, you see the object as it now is.
TreeSet
A TreeSet doesn’t group elements in buckets as HashSet does. Instead, its elements are arranged in a B-tree structure (nobody knows what the “B” stands for), which arranges the elements in such a way that they are always in sort order and can be searched quickly. Iterating across a TreeSet returns the items in sort order. Because elements must be compared, you can’t store a null in a TreeSet.
So TreeSet doesn’t use hashCode(). Instead, you must supply a means of comparing any two elements. You do this in one of two ways:
- Implement the Comparable interface in your element class and provide a compareTo() method to provide the “natural order” of the elements, or…
- Pass a Comparator to the TreeSet constructor.
If you do neither of these, you’ll get a runtime Exception when trying to add an element to the TreeSet.
Unlike HashSet, TreeSet doesn’t have issues with capacity or load factor. But it does provide the aforementioned constructor that accepts a Comparator for an argument.
Losing an element in a TreeSet is a little harder than losing it in a HashSet because the way TreeSet looks for an element depends on the element key’s value relative to the value of the elements that precede or follow it in the tree, and that’s harder to predict. To traverse the tree, a search involves starting at the top and turning right or left depending on what you’re looking for and how it compares to the elements on either side. If the element’s key forces a wrong turn, your search fails.
What You Need to Know
- A Set is a kind of Collection in which each element is unique.
- Such uniqueness is determined by the equals() method of the objects being stored in the Set.
- An attempt to store an object in a Set when an equivalent object already appears there will fail.
- A HashSet depends on the hashCode() method to distribute objects in the Set.
- A TreeSet depends on the compareTo() method or a Comparator to distribute objects in the Set.
- You can store a null in a HashSet. You cannot store a null in a TreeSet.
- An object’s key, for purposes of storing in a Set, is determined by the fields compared in the equals() method. Tampering with any of these fields once the object has been stored will make the object impossible to locate again, and makes possible storing a duplicate object in the Set.
Now let’s explore Maps.