3

I need to use QList<QVariant> as a key to std::unordered_map. The purpose of this is to optimize searching over a table of data by making index over the unique key columns.

So I made this code. It's not complete, but lists some basic data types that occur in the table key columns:

#include <unordered_map>
#include <string>
//std::hash
#include <functional>
//std::size_t
#include <cstddef>
// Hashing method for QVariantList
namespace std {
    template <>
    struct hash<QList<QVariant>>
    {
        std::size_t operator()(const QList<QVariant>& k) const
        {
            using std::size_t;
            using std::hash;
            using std::string;
            size_t hash_num = 0;
            Q_FOREACH(var, k) {
                // Make hash of the primitive value of the QVariant
                switch(var.type()) {
                    case QVariant::String : {
                        hash_num = hash_num^hash<string>(var.toString().toStdString());
                        break;
                    }
                    case QVariant::Char :
                    case QVariant::ULongLong :
                    case QVariant::UInt :
                    case QVariant::LongLong :
                    case QVariant::Int : {
                        hash_num = hash_num^hash<long long>(var.toLongLong());
                        break;
                    }
                    case QVariant::Double : {
                        hash_num = hash_num^hash<double>(var.toDouble());
                        break;
                    }
                }
            }
            return hash_num;
        }
    };
}

Obviously, I don't like the whole switch thing. It's pretty long and ugly code and does only account for the basic types. I'd rather make hash of the memory data allocated for the QVariant's internal data. Or, even better - use some Qt's hashing method.

Is there a semi-reliable* way to hash any QVariant without converting it to primitive type?

*I understand that complex objects might be hiding behind QVariant, but cases where this would lead to collision are rare enough so I don't have to care.

dtech
  • 47,916
  • 17
  • 112
  • 190
Tomáš Zato
  • 50,171
  • 52
  • 268
  • 778
  • Have you already tried Qt's methods like `QVariant::toHash()` ? – Jean-Emmanuel Dec 14 '16 at 14:41
  • @Jean-Emmanuel It [looks like](http://doc.qt.io/qt-5/qvariant.html#toHash) that converts the variant to a `QHash` and does not hash the underlying value. – NathanOliver Dec 14 '16 at 14:45
  • @Jean-Emmanuel `toHash` tries to convert `QVariant` to `QHash` which is a hash map, not a hash. I mean yes, I did do Ctrl+F over the docs and additionally I also read what it says. – Tomáš Zato Dec 14 '16 at 14:45
  • If you don't worry about performance: write `QVariant` to `QByteArray` using `QDataStream` and use `qHash(const QByteArray &)` function. I saw no one easy way to get a "raw" representation of `QVariant` yet. – ilotXXI Dec 14 '16 at 14:52
  • Well you could use a visitor like [this](http://stackoverflow.com/questions/38071414/qvariants-visitor-pattern-without-manual-type-testing-and-casting) and then pass the returned value to `qHash()`. – NathanOliver Dec 14 '16 at 14:53

1 Answers1

9

Get yourself a QByteArray + QBuffer + QDataStream to basically serialize QVariants to the QByteArray.

Then simply hash the raw bytes in the byte array. Qt already implements a qHash function for QByteArray so you are all set.

You can maximize efficiency by reusing the same QByteArray with enough preallocated bytes to avoid reallocations. You can wrap the whole thing in a VariantHasher class, and simply seek(0) for the buffer before each new hashing and only hash the pos() number of bytes instead the whole thing.

class QVariantHasher {
  public:
    QVariantHasher() : buff(&bb), ds(&buff) {
      bb.reserve(1000);
      buff.open(QIODevice::WriteOnly);
    }
    uint hash(const QVariant & v) {
      buff.seek(0);
      ds << v;
      return qHashBits(bb.constData(), buff.pos());
    }
  private:
    QByteArray bb;
    QBuffer buff;
    QDataStream ds;
};

It is pretty fast as mentioned in the comments, and it has the advantage of working with every type that supports QDataStream serialization. For custom types you will only have to implement the serialization, no need to make and maintain a giant switch. If you already have the switch version implemented a comparison would be interesting to make. The switch itself is a lot of branching, while reusing the same byte array is very cache friendly, especially if you don't use to many bytes, that is, you are not hashing variants that contain very long strings or arrays.

Also, it is better than semi-reliable, as the hashing includes the variant type as well, so even in the cases the actual data might be binary identical, for example two bytes with values 255 vs a short with value 65535, the hash will incorporate the type so the values would not collide.

dtech
  • 47,916
  • 17
  • 112
  • 190
  • In this particular case I do care about performance. Do you think what you propose would be slower than `switch` that I made? – Tomáš Zato Dec 14 '16 at 14:59
  • 1
    Well, there is no way to tell until you compare them side by side. How fast do you need it to be? On my system it hashes 4 ints 4 reals 4 strings and 4 QRects in 15213 nanoseconds. Also there is the convenience factor, you don't have to implement a gigantic switch, it works for all types which support serialization. – dtech Dec 14 '16 at 15:10
  • 1
    @TomášZato At first you can make your `switch` more Qt-friendly if you will use `qHash` function. E.g. `var.toString().toStdString()` may be slower than `qHash(var.toString())` because you create a temporary object with allocations in the heap. At second: just try and compare. Nobody knows how does `QDataStream` serialize `QVariant`. At least additional `QByteArray` can have a significant allocation/deallocation time. – ilotXXI Dec 14 '16 at 15:12
  • 1
    @TomášZato You could handle the most common types specially using code like in your question, and then fall back to this answer's method as fall-back and future-proofing for types you don't want to bother handling specially. – hyde Dec 14 '16 at 18:42