Function-based Immutable HashMap in Typescript
This module contains the implementation of the HAMT data structure, which is the backing data structure for the HashMap and HashSet classes.
The HashMap and HashSet classes are easier to use, but the downside is current bundlers such as webpack, esbuild, swc, etc. do not tree-shake classes. Thus, this module exposes the HAMT data structure as a collection of functions so that if you wish you can use the HAMT directly and get the benefit of tree-shaking. There is no additional functionality available in this module, so if you are already using the HashMap or HashSet classes, there is no reason to use this module.
To use, import the functions from the hamt module:
import * as HAMT from "@seedtactics/immutable-collections/hamt";
A note about size: the HAMT data structure nodes do not track the size of the tree. Instead, each function
which modifies the tree returns a value to help track the size externally (for example, the intersection function returns
the size of the intersection). Thus, if you need to know the size, you will need to store it somewhere else and
keep it updated as you modify the tree. You can look at the source code for HashMap to see
how this is done. Note that this module guarantees that null
represents an empty tree,
so you can always check if the tree is empty or not by just comparing the root node to null
.
Data
A leaf node with the hash, key, and value.
Despite being exported to use if you wish, you don't need to access tree nodes directly, the functions in this module manipulate the tree for you. Thus it should be rare to need to use this type.
A mutable version of the LeafNode with a mutable value.
This should only be used during the initial building of the tree so that the tree can be built
efficiently. After the tree is built, you should convert to the immutable LeafNode
type.
export type CollisionNode<K, V> = {
readonly hash: number;
readonly collision: tree.TreeNode<K, V>;
};
A collision node, which stores the colliding entries in a balanced tree
The colliding nodes are stored in a TreeNode.
Despite being exported to use if you wish, you don't need to access tree nodes directly, the functions in this module manipulate the tree for you. Thus it should be rare to need to use this type.
export type MutableCollisionNode<K, V> = {
readonly hash: number;
collision: tree.MutableTreeNode<K, V>;
};
A mutable collision node
This should only be used during the initial building of the tree so that the tree can be built
efficiently. After the tree is built, you should convert to the immutable CollisionNode
type.
export type InternalNode<K, V> = {
readonly bitmap: number;
readonly children: ReadonlyArray<Node<K, V>>;
};
An internal node
Despite being exported to use if you wish, you don't need to access tree nodes directly, the functions in this module manipulate the tree for you. Thus it should be rare to need to use this type.
This implementation of the HAMT breaks the hash into 5-bit chunks. Thus, bitmap is a 32-bit bitmap which stores which children are non-null. The non-null children are stored in the children array.
export type MutableInternalNode<K, V> = {
bitmap: number;
readonly children: Array<MutableNode<K, V>>;
};
A mutable internal node
This should only be used during the initial building of the tree so that the tree can be built
efficiently. After the tree is built, you should convert to the immutable InternalNode
type.
A HAMT tree node
This is the main data type of the HAMT tree, and the type you should use in your own code when passing around references to the tree.
export type MutableNode<K, V> =
| MutableLeafNode<K, V>
| MutableCollisionNode<K, V>
| MutableInternalNode<K, V>;
A mutable HAMT tree node
This should only be used during the initial building of the tree so that the tree can be built
efficiently. After the tree is built, you should convert to the immutable Node
type.
Hash Utils
The configuration for a HashMap
This combines a ComparisonConfig with a hash function for the key type.
A HashConfig
is passed to most functions manipulating the HAMT data structure. You only need one
HashConfig
per key type so you can store a single HashConfig
in a global variable per key type.
The hashValues function can help implement the hash function if you do not have security
considerations.
Interface allowing custom key objects in a HashMap
If you wish to use a custom object as a key in a HashMap, you must implement the hash
function
defined in the HashableObj type and the compare
function defined in the ComparableObj type.
The hash value must be a 32-bit integer. The hashValues function can help implementing
the hash function.
Example
class SomeKey {
public readonly a: number;
public readonly b: string;
constructor(a: number, b: string) {
this.a = a;
this.b = b;
}
hash(): number {
return hashValues(this.a, this.b);
}
compare(other: SomeKey): number {
return (this.a - other.a) || this.b.localeCompare(other.b);
}
}
function hashValues(
...vals: ReadonlyArray<
string | number | boolean | Date | HashableObj | null | undefined
>
): number
Combine multiple hashable values into a single hash
Useful helper function to hash multiple values to a single hash. This uses the FNV-1 hash function, which is NOT secure. If you need a secure hash, use something like highwayhash and implement a custom HashableObj interface.
Create a HashConfig based on the key type
This function is used to create a HashConfig based on the type of key. It supports numbers, strings, booleans, dates, and objects which implement the HashableObj interface. Note that this uses hashValues and is thus NOT cryptographically secure.
Basic Operations
Lookup a key in a HAMT
function insert<K, V>(
cfg: HashConfig<K>,
k: K,
getVal: (v: V | undefined) => V,
rootNode: Node<K, V> | null
): readonly [Node<K, V>, boolean]
Insert or update a key and value in a HAMT
This function lookus up the key and if it is found, the existing value is passed to getVal
.
Otherwise, undefined is passed to getVal
. The return value from getVal
is then placed
into the tree. This function guarantees that if the return value from getVal is ===
the existing the value, the tree is returned unchanged (and the tree root will be the exact same
object). The empty tree is represented by null
.
This returns a tuple of the new tree after the operation and a boolean which is true if the size has increased and false if the value overwrote an existing value and thus the size of the tree didn't change. You can use this to externally track the size of the tree.
Initial Building
function mutateInsert<K, T, V>(
cfg: HashConfig<K>,
k: K,
t: T,
getVal: (old: V | undefined, t: T) => V,
rootNode: MutableNode<K, V> | null
): MutableNode<K, V>
Mutably insert a key and value into a HAMT tree
This function is designed to only be used during the initial construction of
a HAMT from a network request or other data structure.
from and build internally use mutateInsert
and are easier to use,
this is exported for advanced use.
An empty tree is represented as null and the tree will be mutated as values are inserted. The return value is the new root and the old root should not be referenced again. Once the tree is built, the type can be converted from MutableNode to Node. Typically this should happen in a single function whose return value is Node. See the source code of from and build for examples of size tracking.
If you wish to track the size, it must be done inside the getVal
function. If getVal
is passed undefined, then the size is increasing by one.
function from<K, V>(
cfg: HashConfig<K>,
items: Iterable<readonly [K, V]>,
merge?: (v1: V, v2: V) => V
): [Node<K, V> | null, number]
Efficiently create a HAMT from a sequence of key-value pairs
from
efficiently creates a HAMT from a sequence of key-value pairs. An optional merge
function
can be provided. When from
detects a duplicate key, the merge function is called to determine
the value associated to the key. The first parameter v1
to the merge function is the existing value
and the second parameter v2
is the new value just recieved from the sequence. The return value from the
merge function is the value associated to the key. If no merge function is provided, the second value v2
is used, overwriting the first value v1
.
The return value is a tuple of the new tree and the size of the tree.
function build<K, V>(
cfg: HashConfig<K>,
items: Iterable<V>,
key: (v: V) => K
): [Node<K, V> | null, number]
Efficently create a new HAMT
build
efficiently creates a HAMT from a sequence of values and a key extraction function. If a
duplicate key is found, the later value is used and the earlier value is overwritten. If this is
not desired, use the more generalized version of build
which also provides a value extraction function.
The return value is a tuple of the new tree and the size of the tree.
function build<T, K, V>(
cfg: HashConfig<K>,
items: Iterable<T>,
key: (v: T) => K,
val: (old: V | undefined, t: T) => V
): [Node<K, V> | null, number]
Efficently create a new HAMT
build
efficiently creates a HAMT from a sequence of items, a key extraction function, and a value extraction
function. The sequence of initial items can have any type T
, and for each item the key is extracted. If the key does not
yet exist, the val
extraction function is called with undefined
to retrieve the value associated to the key.
If the key already exists in the HAMT, the val
extraction function is called with the old
value to
merge the new item t
into the existing value old
.
The return value is a tuple of the new tree and the size of the tree.
Basic Operations
function remove<K, V>(
cfg: HashConfig<K>,
k: K,
rootNode: Node<K, V> | null
): Node<K, V> | null
Remove a key from a HAMT
If the key exists, remove
returns a new tree with the entry removed. Otherwise, remove
returns the
tree root node unchanged. This can be used to track the size if you wish, decrement the size if the new root
is not ===
to the old root.
function alter<K, V>(
cfg: HashConfig<K>,
k: K,
f: (oldV: V | undefined) => V | undefined,
rootNode: Node<K, V> | null
): [Node<K, V> | null, number]
Insert, change, or remove a key from a HAMT
alter
is a generalization of lookup, insert, and remove.
It can be used to insert a new entry, modify an existing entry, or
delete an existing entry. alter
first looks for the key in the map. The function f
is then
applied to the existing value if the key was found and undefined
if the key does not exist.
If the function f
returns undefined
, the entry is deleted and if f
returns a value, the
entry is updated to use the new value.
The return value is a tuple of the new root and the size change (either +1, 0, or -1).
If the key is not found and f
returns undefined or the key exists and the function f
returns
a value ===
to the existing value, then the root instance is returned unchanged.
Iteration
function iterate<K, V, R>(
f: (k: K, v: V) => R,
root: Node<K, V> | null
): IterableIterator<R>
Iterates the entries in the HAMT
This function produces an iterator
that applies the function f
to each key and value and yields the results. This iterator can be used only once, you must
call iterate
again if you want to iterate the tree again. The order of iteration is undefined.
function fold<K, V, T>(
f: (acc: T, key: K, val: V) => T,
zero: T,
root: Node<K, V> | null
): T
Reduce all the entries in the HAMT to a single value
function mapValues<K, V1, V2>(
f: (v: V1, k: K) => V2,
root: Node<K, V1> | null
): Node<K, V2> | null
Transform the values in a HAMT using a function
mapValues
applies the function f
to each value and key in the HAMT and returns a new HAMT
with the same keys but the values adjusted to the result of the function f
. mapValues
guarantees that if no values are changed, then the HAMT root node returned unchanged.
function collectValues<K, V1, V2>(
f: (v: V1, k: K) => V2 | undefined,
filterNull: boolean,
root: Node<K, V1> | null
): [Node<K, V2> | null, number]
Transform or delete the values in a HAMT using a function
collectValues
applies the function f
to each value and key in the HAMT, with the return value from f
the new value
associated to the key. If f
returns undefined, the key and value is removed. If filterNull
is true and f
returns null,
the key and value are also removed. If filterNull
is false, null values are kept and only undefined values are removed.
collectValues
guarantees that if no values are changed, then the root node is returned
unchanged.
The return value is a tuple of the new root node and size of the new HAMT.
Bulk Modification
function union<K, V>(
cfg: HashConfig<K>,
merge: (v1: V, v2: V, k: K) => V,
root1: Node<K, V> | null,
root2: Node<K, V> | null
): [Node<K, V> | null, number]
Returns a new HAMT which combines all entries in two HAMTs
union
produces a new HAMT which contains all the entries in both HAMT. If a
key appears in only one of the two maps, the value from the map is used. If a key appears
in both maps, the provided merge function is used to determine the value.
union
guarantees that if the resulting HAMT is equal to root1
, then the root1
object
instance is returned unchanged.
The return value is a tuple of the new root node and the size of the intersection (since
the algorithm can skip and not traverse sections of the tree that are not in both trees).
Thus, to compute the size after the union, the formula is root1size + root2size - intersectionSize
.
function intersection<K, V>(
cfg: HashConfig<K>,
merge: (v1: V, v2: V, k: K) => V,
root1: Node<K, V> | null,
root2: Node<K, V> | null
): [Node<K, V> | null, number]
Returns a new HAMT which contains only entries whose keys are in both HAMTs
intersection
produces a HAMT which contains all the entries which have keys in
both HAMTs. For each such entry, the merge function is used to determine the resulting value.
intersection
guarantees that if the resulting HAMT is equal to root1
, then root1
is returned
unchanged.
The return value is a tuple of the new HAMT and the size of the intersection, so the number of entries in the new HAMT.
function difference<K, V1, V2>(
cfg: HashConfig<K>,
root1: Node<K, V1> | null,
root2: Node<K, V2> | null
): readonly [Node<K, V1> | null, number]
Returns a new HAMT which contains only keys which appear in the first but not the second HAMT
difference
produces a HAMT which contains all the entries in root1
where the key does
not exist in root2
. Can think of this as root1 - root2
where the subtraction
is removing all the keys in root2
from root1
. The values of the root2
HashMap are ignored and
can be any value V2
.
The return value is a tuple of the HAMT root and the number of entries removed from root1
.
difference
guarantees that if no entries are removed from root1
, then root1
object
is returned unchanged.
function adjust<K, V1, V2>(
cfg: HashConfig<K>,
f: (v1: V1 | undefined, v2: V2, k: K) => V1 | undefined,
root1: Node<K, V1> | null,
root2: Node<K, V2> | null
): readonly [Node<K, V1> | null, number]
Return a HAMT which adjusts all the provided keys with a specified modification function.
adjust
is passed two HAMTs: root1
is the HAMT to modify and root2
is the keys to adjust associated to helper
values of type V2
(the type V2
can be anything and does not need to be related V1
).
For each key in root2
to modify, adjust
looks up the key in root1
and then calls the function f
with the current existing value in root1
(or undefined
if the key does not exist) and the helper value from root2
associated with the key. The return value from f
is set as the new value for the key, or removed if the return value is undefined
.
The return value is a tuple of the HAMT root and the number of keys removed. Note the number of keys removed can be negative if
nodes were added to the HAMT. adjust
guarantees that if nothing was added, removed, or changed, then root1
is returned.