Best data structure for storing a multivector

Question

I'm working on a library for doing computations with multivectors and I'm trying to figure out how to properly store the data.

Very quickly, a multivector is the sum of blades, each containing zero or more bases. For example, V = 1 + 2x + 3yz - 4xz is a multivector.

In Python, the structure I settled on was a mapping of tuples of basis indices to scalars; in type syntax that's TermDict = Dict[Tuple[int, ...], float] - V above would be stored as {(): 1.0, (0,): 2.0, (1, 2): 3.0, (0, 2): -4.0}

However I'm now attempting to port this to C++ and getting uncomfortable. Currently what I have for the purpose is:

// Adapted from https://stackoverflow.com/a/53283994/6605349
struct SetHasher {
    int operator()(const std::set<unsigned int>& V) const {
        int hash = V.size();
        for (auto &i : V) {
            hash ^= i + 0x9e3779b9 + (hash << 6) + (hash >> 2);
        }
        return hash;
    }
};
typedef std::unordered_map<std::set<unsigned int>, double, SetHasher> TermDict;

which I believe just emulates the Python behavior. However, with apologies to Chirag, I have a lot less faith in a random SO answer than in the CPython tuple.__hash__ implementation, so I'm wondering if there's a better way to go about this that can avoid the need to hash a collection of indices. I could possibly apply this to the Python implementation too. What might there be? Or have I got the best idea after all?

Must the bases be ordered by value and there cannot be duplicates? Unless the answer is an emphatic yes, a set is the wrong container. — Sam Varshavchik, Dec 21 '22 at 15:03
There cannot be duplicate bases (because *xx* = 1); the indices will always be processed as if they are in ascending order (*zy* is normalized to -*yz* due to anticommutativity) but they can theoretically be stored in any order. — AbyxDev, Dec 21 '22 at 15:09
Then `std::set` is wrong, and a vector looks appropriate. Whether or not an unordered map or an ordered map is more appropriate depends on the complexity requirements for accessing the container. That's something only you can figure out, based on your specific application's requirements. If you conclude that unordered map's complexity is better suited then you have no alternatives to implementing a hash function. — Sam Varshavchik, Dec 21 '22 at 15:12
@Ranoiaetep I considered using a bitfield, however the number of bases is unbounded (there could be a basis vector for the 193rd dimension involved). Using a bitfield would limit me to the width of the field; or a variable width bitfield would waste space for large dimensions (193rd dimension would require 192 preceding 0 bits). — AbyxDev, Dec 21 '22 at 15:20
Looks like a simple array to me (can be realised as std::vector, std::array or a plain C-style array). But if you have a vector space of 193 dimensions, the corresponding geometric algebra has 2^193 dimensions. Seems a bit impractical. — n. m. could be an AI, Dec 21 '22 at 15:23
@SamVarshavchik Would an `unordered_set` still be worse than a plain `vector` as the key for the `unordered_map`? — AbyxDev, Dec 21 '22 at 15:26
@n.m. How so? How would the example vector from the question be stored in simple array form? — AbyxDev, Dec 21 '22 at 15:27
It depends on the hash function. It could be better, or worse. — Sam Varshavchik, Dec 21 '22 at 15:28
If you have a 3D space, the geometric algebra has 8 dimensions: `{}, {x}, {y}, {z}, {xy}, {yz}, {xz}, {xyz}`. If you look closely, these are just different spellings for the first 8 non-negative integers in base 2: `000, 001, 010, 100, 011, 110, 101, 111`. — n. m. could be an AI, Dec 21 '22 at 15:30
@n.m. That sounds like a rehash (heh) of the bitfield solution proposed by a now-deleted comment - it is precisely the fact that 193 dimensions would require a 193-bit number to index that leads me to go down the mapping of collection to float route instead. — AbyxDev, Dec 21 '22 at 15:33
Abstractly it is still a 2^193-dimensional vector, indexed by 193-bit numbers. Of course it is impractical to represent it as a straight `std::vector`. Supposedly it is extremely sparse, i.e. most of the entries are zeros. `std::unordered_map` could be a good solution. Now how to represent the keys is a separate question. For a few hundreds of bits, straight bitsets are not out of the question, but if you have thousands of bits, it becomes less practical. ‎‎There are implementations of sparse bitsets, check them out. — n. m. could be an AI, Dec 21 '22 at 15:44
*'I have a lot less faith in a random SO answer'* -- How about ***[Boost.ContainerHash](https://www.boost.org/doc/libs/1_81_0/libs/container_hash/)*** and *[Boost.Unordered](https://www.boost.org/doc/libs/1_81_0/libs/unordered)* — Ranoiaetep, Dec 21 '22 at 15:48

Best data structure for storing a multivector

0 Answers0