Class CompactHashSet<E>
- All Implemented Interfaces:
Serializable,Iterable<E>,Collection<E>,Set<E>
- Direct Known Subclasses:
CompactLinkedHashSet
contains(x), add(x) and remove(x), are all (expected and amortized)
constant time operations. Expected in the hashtable sense (depends on the hash function doing a
good job of distributing the elements to the buckets to a distribution not far from uniform), and
amortized since some operations can trigger a hash table resize.
Unlike java.util.HashSet, iteration is only proportional to the actual size(),
which is optimal, and not the size of the internal hashtable, which could be much larger
than size(). Furthermore, this structure only depends on a fixed number of arrays;
add(x) operations do not create objects for the garbage collector to deal with, and for
every element added, the garbage collector will have to traverse 1.5 references on
average, in the marking phase, not 5.0 as in java.util.HashSet.
If there are no removals, then iteration order is the same as insertion
order. Any removal invalidates any ordering guarantees.
This class should not be assumed to be universally superior to java.util.HashSet.
Generally speaking, this class reduces object allocation and memory consumption at the price of
moderately increased constant factors of CPU. Only use this class when there is a specific reason
to prioritize memory over CPU.
-
Field Summary
FieldsModifier and TypeFieldDescription(package private) Object[]The elements contained in the set, in the range of [0, size()).private int[]Contains the logical entries, in the range of [0, size()).(package private) static final doubleMaximum allowed false positive probability of detecting a hash flooding attack given random input.private static final intMaximum allowed length of a hash table bucket before falling back to a j.u.LinkedHashSet based implementation.private intKeeps track of metadata like the number of hash table bits and modifications of this data structure (to make it possible to throw ConcurrentModificationException in the iterator).private intThe number of elements contained in the set.private ObjectThe hashtable object. -
Constructor Summary
ConstructorsConstructorDescriptionConstructs a new empty instance ofCompactHashSet.CompactHashSet(int expectedSize) Constructs a new instance ofCompactHashSetwith the specified capacity. -
Method Summary
Modifier and TypeMethodDescriptionboolean(package private) intadjustAfterRemove(int indexBeforeRemove, int indexRemoved) Updates the index an iterator is pointing to after a call to remove: returns the index of the entry that should be looked at after a removal on indexRemoved, with indexBeforeRemove as the index that *was* the next entry that would be looked at.(package private) intHandle lazy allocation of arrays.voidclear()booleanstatic <E> CompactHashSet<E>create()Creates an emptyCompactHashSetinstance.static <E> CompactHashSet<E>create(E... elements) Creates a mutableCompactHashSetinstance containing the given elements in unspecified order.static <E> CompactHashSet<E>create(Collection<? extends E> collection) Creates a mutableCompactHashSetinstance containing the elements of the given collection in unspecified order.createHashFloodingResistantDelegate(int tableSize) static <E> CompactHashSet<E>createWithExpectedSize(int expectedSize) Creates aCompactHashSetinstance, with a high enough "initial capacity" that it should holdexpectedSizeelements without growth.private Eelement(int i) private intentry(int i) (package private) intvoid(package private) intgetSuccessor(int entryIndex) private intGets the hash table mask using the stored number of hash table bits.(package private) void(package private) voidinit(int expectedSize) Pseudoconstructor for serialization support.(package private) voidinsertEntry(int entryIndex, E object, int hash, int mask) Creates a fresh entry with the specified object at the specified position in the entry arrays.booleanisEmpty()(package private) booleaniterator()(package private) voidmoveLastEntry(int dstIndex, int mask) Moves the last entry in the entry array intodstIndex, and nulls out its old position.(package private) booleanReturns whether arrays need to be allocated.private voidreadObject(ObjectInputStream stream) booleanprivate Object[]private int[]private Object(package private) voidresizeEntries(int newCapacity) Resizes the internal entries array to the specified capacity, which may be greater or less than the current capacity.private voidresizeMeMaybe(int newSize) Resizes the entries storage if necessary.private intresizeTable(int oldMask, int newCapacity, int targetHash, int targetEntryIndex) private voidsetElement(int i, E value) private voidsetEntry(int i, int value) private voidsetHashTableMask(int mask) Stores the hash table mask as the number of bits needed to represent an index.intsize()Object[]toArray()<T> T[]toArray(T[] a) voidEnsures that thisCompactHashSethas the smallest representation in memory, given its current size.private voidwriteObject(ObjectOutputStream stream) Methods inherited from class java.util.AbstractSet
equals, hashCode, removeAllMethods inherited from class java.util.AbstractCollection
addAll, containsAll, retainAll, toStringMethods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, waitMethods inherited from interface java.util.Collection
parallelStream, removeIf, stream, toArrayMethods inherited from interface java.util.Set
addAll, containsAll, retainAll
-
Field Details
-
HASH_FLOODING_FPP
static final double HASH_FLOODING_FPPMaximum allowed false positive probability of detecting a hash flooding attack given random input.- See Also:
-
MAX_HASH_BUCKET_LENGTH
private static final int MAX_HASH_BUCKET_LENGTHMaximum allowed length of a hash table bucket before falling back to a j.u.LinkedHashSet based implementation. Experimentally determined.- See Also:
-
table
The hashtable object. This can be either:- a byte[], short[], or int[], with size a power of two, created by
CompactHashing.createTable, whose values are either
- UNSET, meaning "null pointer"
- one plus an index into the entries and elements array
- another java.util.Set delegate implementation. In most modern JDKs, normal java.util hash collections intelligently fall back to a binary search tree if hash table collisions are detected. Rather than going to all the trouble of reimplementing this ourselves, we simply switch over to use the JDK implementation wholesale if probable hash flooding is detected, sacrificing the compactness guarantee in very rare cases in exchange for much more reliable worst-case behavior.
- null, if no entries have yet been added to the map
- a byte[], short[], or int[], with size a power of two, created by
CompactHashing.createTable, whose values are either
-
entries
@CheckForNull private transient int[] entriesContains the logical entries, in the range of [0, size()). The high bits of each int are the part of the smeared hash of the element not covered by the hashtable mask, whereas the low bits are the "next" pointer (pointing to the next entry in the bucket chain), which will always be less than or equal to the hashtable mask.hash = aaaaaaaa mask = 00000fff next = 00000bbb entry = aaaaabbb
The pointers in [size(), entries.length) are all "null" (UNSET).
-
elements
The elements contained in the set, in the range of [0, size()). The elements in [size(), elements.length) are allnull. -
metadata
private transient int metadataKeeps track of metadata like the number of hash table bits and modifications of this data structure (to make it possible to throw ConcurrentModificationException in the iterator). Note that we choose not to make this volatile, so we do less of a "best effort" to track such errors, for better performance. -
size
private transient int sizeThe number of elements contained in the set.
-
-
Constructor Details
-
CompactHashSet
CompactHashSet()Constructs a new empty instance ofCompactHashSet. -
CompactHashSet
CompactHashSet(int expectedSize) Constructs a new instance ofCompactHashSetwith the specified capacity.- Parameters:
expectedSize- the initial capacity of thisCompactHashSet.
-
-
Method Details
-
create
Creates an emptyCompactHashSetinstance. -
create
Creates a mutableCompactHashSetinstance containing the elements of the given collection in unspecified order.- Parameters:
collection- the elements that the set should contain- Returns:
- a new
CompactHashSetcontaining those elements (minus duplicates)
-
create
Creates a mutableCompactHashSetinstance containing the given elements in unspecified order.- Parameters:
elements- the elements that the set should contain- Returns:
- a new
CompactHashSetcontaining those elements (minus duplicates)
-
createWithExpectedSize
Creates aCompactHashSetinstance, with a high enough "initial capacity" that it should holdexpectedSizeelements without growth.- Parameters:
expectedSize- the number of elements you expect to add to the returned set- Returns:
- a new, empty
CompactHashSetwith enough capacity to holdexpectedSizeelements without resizing - Throws:
IllegalArgumentException- ifexpectedSizeis negative
-
init
void init(int expectedSize) Pseudoconstructor for serialization support. -
needsAllocArrays
boolean needsAllocArrays()Returns whether arrays need to be allocated. -
allocArrays
int allocArrays()Handle lazy allocation of arrays. -
delegateOrNull
-
createHashFloodingResistantDelegate
-
convertToHashFloodingResistantImplementation
-
isUsingHashFloodingResistance
boolean isUsingHashFloodingResistance() -
setHashTableMask
private void setHashTableMask(int mask) Stores the hash table mask as the number of bits needed to represent an index. -
hashTableMask
private int hashTableMask()Gets the hash table mask using the stored number of hash table bits. -
incrementModCount
void incrementModCount() -
add
- Specified by:
addin interfaceCollection<E>- Specified by:
addin interfaceSet<E>- Overrides:
addin classAbstractCollection<E>
-
insertEntry
Creates a fresh entry with the specified object at the specified position in the entry arrays. -
resizeMeMaybe
private void resizeMeMaybe(int newSize) Resizes the entries storage if necessary. -
resizeEntries
void resizeEntries(int newCapacity) Resizes the internal entries array to the specified capacity, which may be greater or less than the current capacity. -
resizeTable
private int resizeTable(int oldMask, int newCapacity, int targetHash, int targetEntryIndex) -
contains
- Specified by:
containsin interfaceCollection<E>- Specified by:
containsin interfaceSet<E>- Overrides:
containsin classAbstractCollection<E>
-
remove
- Specified by:
removein interfaceCollection<E>- Specified by:
removein interfaceSet<E>- Overrides:
removein classAbstractCollection<E>
-
moveLastEntry
void moveLastEntry(int dstIndex, int mask) Moves the last entry in the entry array intodstIndex, and nulls out its old position. -
firstEntryIndex
int firstEntryIndex() -
getSuccessor
int getSuccessor(int entryIndex) -
adjustAfterRemove
int adjustAfterRemove(int indexBeforeRemove, int indexRemoved) Updates the index an iterator is pointing to after a call to remove: returns the index of the entry that should be looked at after a removal on indexRemoved, with indexBeforeRemove as the index that *was* the next entry that would be looked at. -
iterator
-
spliterator
- Specified by:
spliteratorin interfaceCollection<E>- Specified by:
spliteratorin interfaceIterable<E>- Specified by:
spliteratorin interfaceSet<E>
-
forEach
-
size
public int size()- Specified by:
sizein interfaceCollection<E>- Specified by:
sizein interfaceSet<E>- Specified by:
sizein classAbstractCollection<E>
-
isEmpty
public boolean isEmpty()- Specified by:
isEmptyin interfaceCollection<E>- Specified by:
isEmptyin interfaceSet<E>- Overrides:
isEmptyin classAbstractCollection<E>
-
toArray
- Specified by:
toArrayin interfaceCollection<E>- Specified by:
toArrayin interfaceSet<E>- Overrides:
toArrayin classAbstractCollection<E>
-
toArray
public <T> T[] toArray(T[] a) - Specified by:
toArrayin interfaceCollection<E>- Specified by:
toArrayin interfaceSet<E>- Overrides:
toArrayin classAbstractCollection<E>
-
trimToSize
public void trimToSize()Ensures that thisCompactHashSethas the smallest representation in memory, given its current size. -
clear
public void clear()- Specified by:
clearin interfaceCollection<E>- Specified by:
clearin interfaceSet<E>- Overrides:
clearin classAbstractCollection<E>
-
writeObject
- Throws:
IOException
-
readObject
- Throws:
IOExceptionClassNotFoundException
-
requireTable
-
requireEntries
private int[] requireEntries() -
requireElements
-
element
-
entry
private int entry(int i) -
setElement
-
setEntry
private void setEntry(int i, int value)
-