Sbitsets
Sparse bitsets library: an alternative to bitsets for
representing finite sets of natural numbers.
Introduction
In the ordinary bitsets library, sets of natural numbers are encoded
as bignums. This is perfectly fine when the set elements are relatively small,
but it is not an efficient way to deal with sets of large numbers. For
instance, trying to insert a number like 2^30 into an ordinary bitset will
probably kill your Lisp.
Sparse bitsets are a more forgiving alternative. They can handle very large
set elements, but still achieve bitset-like efficiencies in many cases.
Sparse bitsets are represented as ordered lists of (offset . block)
pairs. Loosely speaking, each such pair in the set X means that:
offset*blocksize + 0 \in X iff (logbitp 0 block)
offset*blocksize + 1 \in X iff (logbitp 1 block)
...
offset*blocksize + (blocksize-1) \in X iff (logbitp (blocksize-1) block)
The blocksize is a some constant determined by *sbitset-block-size*. We'll assume it is 60 (which makes each block a
fixnum on 64-bit CCL).
Efficiency Considerations
Sparse bitsets are essentially a hybrid of ordered sets and bitsets, and
their efficiency characteristics are very data dependent.
Sparse bitsets perform best on sets that have "clusters" of nearby
members, perhaps separated by wide gaps. Here, sparse bitsets give you some of
the same benefits of bitsets, viz. word-at-a-time operations like union and the
space efficiency of using bit masks to represent the set members.
For completely dense sets, e.g., all integers in [0, 1000], sparse
bitsets are:
- Better than ordered sets, because they get to compress nearby elements into
bitmasks and use word-at-a-time operations.
- Somewhat worse than ordinary bitsets. They are less space efficient due to
the conses and offset numbers. They are also more expensive for membership
testing: regular bitsets can basically use O(1) array indexing via logbitp, while sparse bitsets must use an O(n) search for the right
pair.
For "cluster-free" sets where the elements are so far apart that they
never fall into the same block, sparse bitsets are:
- Better than bitsets, which perform badly here since they can have large
segments of 0 bits that waste space and take time to process during set
operations.
- Somewhat worse than ordered sets. For instance, a set like {0, 60, 120}
would be encoded as ((0 . 1) (1 . 1) (2 . 1)), which is similar to its
ordered set representation except for the additional overhead of using (1
. 1) to represent 60, etc.
Sparse Bitset Operations
Convention:
- a, b, ... denote set elements
- X, Y, ... denote sets.
Valid sparse bitsets are recognized by sbitsetp, and there is a
standard fixing function, sbitset-fix.
We intend to eventually implement sparse analogues for all of the functions
in the bitsets library, and analogues for the bitset reasoning
techniques. For now we have only finished a few of these operations:
Sparse Bitset Construtors
- (sbitset-singleton a)
- Constructs the set {a}
- (sbitset-union X Y ...)
- Constructs the set X U Y U ...
- (sbitset-intersect X Y ...)
- Constructs the set X \intersect Y \intersect ...
- (sbitset-difference X Y)
- Constructs the set X - Y
Inspecting Sparse Bitsets
- Todo
Enumerating Sparse Bitset Members
- (sbitset-members X)
- Constructs an ordinary ordered set with the elements of X.
- Important in reasoning about sparse bitsets.
Subtopics
- Sbitset-members
- (sbitset-members x) converts a bitset into an ordinary, ordered
set.
- Sbitset-pairp
- (sbitset-pairp x) recognizes a valid (offset . block) pair
for sparse bitsets.
- Sbitsetp
- (sbitsetp x) determines whether X is a well-formed sparse
bitset.
- Sbitset-intersect
- (sbitset-intersect X Y ...) constructs the set X \intersect Y
\intersect ....
- Sbitset-difference
- (sbitset-difference X Y) constructs the set X - Y.
- Sbitset-union
- (sbitset-union X Y ...) constructs the set X U Y U ....
- Sbitset-singleton
- (sbitset-singleton a) constructs the singleton set {a}.
- Sbitset-fix
- (sbitset-fix x) is a basic fixing function for sparse bitsets.