1

I am trying to use std::string as a key in the stxxl::map The insertion was fine for small number of strings about 10-100. But while trying to insert large number of strings about 100000 in it, I am getting segmentation fault.

The code is as follows:

struct CompareGreaterString {
    bool operator () (const std::string& a, const std::string& b) const {
       return a > b;
    }
    static std::string max_value() {
       return "";
    } 
};

// template parameter <KeyType, DataType, CompareType, RawNodeSize, RawLeafSize, PDAllocStrategy (optional)>
typedef stxxl::map<std::string, unsigned int, CompareGreaterString, DATA_NODE_BLOCK_SIZE, DATA_LEAF_BLOCK_SIZE> name_map;
name_map strMap((name_map::node_block_type::raw_size)*3, (name_map::leaf_block_type::raw_size)*3);
for (unsigned int i = 0; i < 1000000; i++) { /// Inserting 1 million strings
    std::stringstream strStream;
    strStream << (i);
    Console::println("Inserting: " + strStream.str());
    strMap[strStream.str()]=i;
}

In here I am unable to identify why I am unable to insert more number of strings. I am getting segmentation fault exactly while inserting "1377". Plus I am able to add any number of integers as key. I feel that the variable size of string might be causing this trouble.

Also I am unable to understand what to return for max_value of the string. I simply returned a blank string.

sooper
  • 5,991
  • 6
  • 40
  • 65
Sriram Mahavadi
  • 417
  • 4
  • 8
  • Hard to tell. You might need to provide the exact line where it segfaults and a bit of surrounding code, I presume from the stxxl library. – Yirkha Apr 17 '14 at 13:40

3 Answers3

3

According to documentation:

CompareType must also provide a static max_value method, that returns a value of type KeyType that is larger than any key stored in map

Because empty string happens to compare as smaller than any other string, it breaks this precondition and may thus cause unspecified behaviour.

Here's a max_value that should work. MAX_KEY_LEN is just an integer which is larger or equal to the length of the longest possible string key that the map can have.

struct CompareGreaterString {
    // ...
    static std::string max_value() {
        return std::string(MAX_KEY_LEN, std::numeric_limits<unsigned char>::max());
    }
};
eerorika
  • 232,697
  • 12
  • 197
  • 326
  • I am getting an error while adding the first string "0" itself, as "Assertion `it != root_node_.end()' failed" for any MAX_KEY_LEN other than 0. I am getting the error while trying to insert at line "strMap[strStream.str()]=i;" – Sriram Mahavadi Apr 17 '14 at 14:00
  • 1
    As for comparison, [`char_traits::lt`](http://en.cppreference.com/w/cpp/string/char_traits/cmp) is very interesting - I think one should use `numeric_limits::max()`. – Martin Ba Apr 17 '14 at 14:06
  • still no luck. I am still getting segmentation fault while inserting the first string "0" isself. do we need to make sure that the key be of fixed length? – Sriram Mahavadi Apr 17 '14 at 14:12
  • @MartinBa, ah, that is quite interesting. Perhaps max `unsigned char` (converted to `char`) is correct. – eerorika Apr 17 '14 at 14:18
  • 2
    @SriramMahavadi. It appears that stxxl does not support non-pod value types. I don't know much about stxxl, but I'm guessing that this limitation also extends to key types. You may have to use some workaround such as using hashes of the strings as keys or some fixed length string. – eerorika Apr 17 '14 at 20:33
  • Tried with fixed length string as key. Plus structure with char[] is POD hence should not be a problem. But i am getting errors while doing **map.find()** as no match to operator==. And then i am getting some **Assertion `it != root_node_.end()'** error for returning as numeric_limits<>::max. Instead the programming is running when i am returning as numeric_limits::min. Don't know why... – Sriram Mahavadi Apr 18 '14 at 13:00
1

I have finally found the solution to my problem with great help from Timo bingmann, user2079303 and Martin Ba. Thank you.

I would like to share it with you.

Firstly stxxl supports POD only. That means it stores fixed sized structures only. Hence std::string cannot be a key. stxxl::map worked for about 100-1000 strings because they were contained in the physical memory itself. When more strings are inserted it has to write on disk which is internally causing some problems.

Hence we need to use a fixed string using char[] as follows:

static const int MAX_KEY_LEN = 16;

class FixedString { 
public:
    char charStr[MAX_KEY_LEN];

    bool operator< (const FixedString& fixedString) const {
        return std::lexicographical_compare(charStr, charStr+MAX_KEY_LEN,
            fixedString.charStr, fixedString.charStr+MAX_KEY_LEN);
    }

    bool operator==(const FixedString& fixedString) const {
        return std::equal(charStr, charStr+MAX_KEY_LEN, fixedString.charStr);
    }

    bool operator!=(const FixedString& fixedString) const {
        return !std::equal(charStr, charStr+MAX_KEY_LEN, fixedString.charStr);
    } 
};

struct comp_type : public std::less<FixedString> {
    static FixedString max_value()
    {
        FixedString s;
        std::fill(s.charStr, s.charStr+MAX_KEY_LEN, 0x7f);
        return s;
    } 
};

Please note that all the operators mainly((), ==, !=) need to be overriden for all the stxxl::map functions to work Now we may define fixed_name_map for map as follows:

typedef stxxl::map<FixedString, unsigned int, comp_type, DATA_NODE_BLOCK_SIZE, DATA_LEAF_BLOCK_SIZE> fixed_name_map;
fixed_name_map myFixedMap((fixed_name_map::node_block_type::raw_size)*5, (fixed_name_map::leaf_block_type::raw_size)*5);

Now the program is compiling fine and is accepting about 10^8 strings without any problem. also we can use myFixedMap like std::map itself. {for ex: myFixedMap[fixedString] = 10}

Sriram Mahavadi
  • 417
  • 4
  • 8
1

If you are using C++11, then as an alternative to the FixedString class you could use std::array<char, MAX_KEY_LEN>. It is an STL layer on top of an ordinary fixed-size C array, implementing comparisons and iterators as you are used to from std::string, but it's a POD type, so STXXL should support it.

Alternatively, you can use serialization_sort in TPIE. It can sort elements of type std::pair<std::string, unsigned int> just fine, so if all you need is to insert everything in bulk and then access it in bulk, this will be sufficient for your case (and probably faster depending on the exact case).

Mathias Rav
  • 2,808
  • 14
  • 24