5

Given the following code:

struct Item
{
    std::string name;
    int someInt;
    string someString;
    Item(const std::string& aName):name(aName){}
};
std::unordered_map<std::string, Item*> items;
Item* item = new Item("testitem");
items.insert(make_pair(item.name, item);

The item name will be stored in memory two times - once as part of the Item struct and once as the key of the map entry. Is it possible to avoid the duplication? With some 100M records this overhead becomes huge.

Note: I need to have the name inside the Item structure because I use the hashmap as index to another container of Item-s, and there I don't have access to the map's key values.

Alexander Vassilev
  • 1,399
  • 13
  • 23
  • 2
    I think [Boost.MultiIndex](http://www.boost.org/doc/libs/1_52_0/libs/multi_index/doc/index.html) provides this kind of features. – Luc Touraille Nov 29 '12 at 10:02
  • Where are your items stored in the first place? – Luc Touraille Nov 29 '12 at 10:06
  • @ Luc Touraille: Yes, I thought about multiindex, but I don't use Boost. Is there a way I can use multiindex standalone, without having to build/link the whole boost lib? – Alexander Vassilev Nov 29 '12 at 10:39
  • 2
    Boost.MultiIndex is a header-only library. You only link instances of the containers that you actually used and a bunch helper templates (like the hash functions). Boost distribution contains a tool called `bcp` that can extract dependencies of given module if you want to include only the relevant part into your version control. – Jan Hudec Nov 29 '12 at 10:53

5 Answers5

4

OK, since you say you are using pointers as values, I hereby bring my answer back to life.

A bit hacky, but should work. Basicly you use pointer and a custom hash function

struct Item
{
    std::string name;
    int someInt;
    string someString;
    Item(const std::string& aName):name(aName){}

    struct name_hash  
    { 
       size_t operator() (std::string* name)
       {
           std::hash<std::string> h;
           return h(*name);
       }
    };
};
std::unordered_map<std::string*, Item*, Item::name_hash> items;
Item* item = new Item ("testitem");
items.insert(make_pair(&(item->name), item);
2

Assuming the structure you use to store your items in the first place is a simple list, you could replace it with a multi-indexed container.

Something along thoses lines (untested) should fulfill your requirements:

typedef multi_index_container<
    Item,
    indexed_by<
        sequenced<>,
        hashed_unique<member<Item, std::string, &Item::name
    >
> itemContainer;

itemContainer items;

Now you can access items either in their order of insertion, or look them up by name:

itemContainer::nth_index<0>::type & sequentialItems = items.get<O>();
// use sequentialItems as a regular std::list

itemContainer::nth_index<1>::type & associativeItems = items.get<1>();
// uses associativeItems as a regular std::unordered_set

Depending on your needs, you can use other indexings as well.

Luc Touraille
  • 79,925
  • 15
  • 92
  • 137
1

Don't store std::string name field in your struct. Anyway when you perform lookup you already know name field.

Denis Ermolin
  • 5,530
  • 6
  • 27
  • 44
  • I use the hashmap as an index. In fact it stores pointers to items, not the actual items, but wanted to have a simple example. The items are also referenced from another structure, where I don't have access to the map's keys. – Alexander Vassilev Nov 29 '12 at 09:42
  • @ Steve Jessop: Yes, I'm sorry for the confusion, I have updated the question. The string* will solve it. Thanks a lot – Alexander Vassilev Nov 29 '12 at 10:18
1

No, there isn't. You can:

  • Not store name in Item and pass it around separately.
  • Create Item, ItemData that has the same fields as Item except the name and either
    • derive Item from std::pair<std::string, ItemData> (= value_type of the type) or
    • make it convertible to and from that type.
  • Use a reference to string for the key. You should be able to use std::reference_wrapper<const std::string> as key and pass key in std::cref(value.name) for key and std::cref(std::string(whatever)) for searching. You may have to specialize std::hash<std::reference_wrapper<const std::string>>, but it should be easy.
  • Use std::unordered_set, but it has the disadvantage that lookup creates dummy Item for lookup.
    • When you actually have Item * as value type, you can move the name to a base class and use polymorphism to avoid that disadvantage.
  • Create custom hash map, e.g. with Boost.Intrusive.
Jan Hudec
  • 73,652
  • 13
  • 125
  • 172
  • I need to have the name inside the Item structure, I have explained it in the comment to Denis Ermolin's answer. Deriving Item from the value_type of the map is a great idea, hadn't thought about that, I think it will solve my question. – Alexander Vassilev Nov 29 '12 at 09:43
  • @AlexanderVassilev: The value/pointer distinction however changes EVERYTHING! Please, MODIFY THE QUESTION! – Jan Hudec Nov 29 '12 at 09:46
  • 1
    I've added the `reference_wrapper` option, which should work for you (it will effectively use pointer to the name member as key). And the `unordered_set` option also works a bit better with pointers. – Jan Hudec Nov 29 '12 at 10:04
  • Well, in fact deriving Item from the value_type of the map will not work in my real code (as I initially thought) - I would need to store iterators to the elements of the hashmap, and hashmaps may invalidate iterators on insertion. – Alexander Vassilev Nov 29 '12 at 10:05
  • The reference_wrapper class is very interesting, I didn't know about it. It should solve it - but I prefer to minimize the use of C++11 at this point, so I'd rather use a std::string pointer to the Item::name member instead. Do you think that a reference wrapper will have no additional overhead compared to a raw pointer? – Alexander Vassilev Nov 29 '12 at 10:16
  • 1
    @AlexanderVassilev: `reference_wrapper` contains only a pointer to the target type and a bunch of inline methods and operators. The problem with raw pointer is that it's not allowed to take pointer to temporary, but it's allowed to pass temporary to a function taking const lvalue reference and that can take pointer, so the `cref` helper will work with temporaries. – Jan Hudec Nov 29 '12 at 10:32
  • @ Jan Hudec: "it's not allowed to take pointer to temporary," - I didn't get this. Do you mean the parameter to the hash function? If so, I will implement a hash function for string* – Alexander Vassilev Nov 29 '12 at 10:47
  • @AlexanderVassilev: No, the parameter to the `operator[]`. You can't write `collection[&std::string("key")]`, but you can write `collection[std::cref(std::string("key"))]`. Of course you can trivially write `tempalte T const *cptr(const T &x) { return &x; }` and use it as `collection[cptr(std::string("key"))]`. – Jan Hudec Nov 29 '12 at 10:56
  • This seems to work on GCC, with specialized hash: int main() { string k = "test"; m.insert(make_pair(&k,"test one")); printf("%s\n", m[&k].c_str()); } – Alexander Vassilev Nov 29 '12 at 11:27
  • 1
    @AlexanderVassilev: That works. Because `k` isn't a TEMPORARY. – Jan Hudec Nov 29 '12 at 11:46
  • Ah, I got your idea, you are right, it's cumbersome to use with literals. In this case I won't use the [] operator with literals, so I won't need to construct temporaries. – Alexander Vassilev Nov 29 '12 at 13:18
1

TL;DR If you are using libstdc++ (coming with gcc) you are already fine.

There are 3 ways, 2 are "simple":

  • split your object in two Key/Value, and stop duplicated the Key in the Value
  • store your object in a unordered_set instead

The 3rd one is more complicated, unless provided by your compiler:

  • use an implementation of std::string that is reference counted (such as libstdc++'s)

In this case, when you copy a std::string into another, the reference counter of the internal buffer is incremented... and that's all. Copy is deferred to a time where a modification is requested by one of the owners: Copy On Write.

Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
  • Related to the third one: use `std::shared_ptr` both in `Item` and for the key. Of course if the names are smaller than the overhead of `shared_ptr` this doesn't help, but likewise if the names are shorter than the overhead of a refcounted string then a refcounted string doesn't help. I expect the latter has lower overhead. – Steve Jessop Nov 29 '12 at 09:40
  • @SteveJessop: that would be portable, though at that point it might cost less to recreate a `string` class around `shared_ptr` rather than incur the triple/quadruple allocation. – Matthieu M. Nov 29 '12 at 09:56
  • Are you counting one for the `Item`? I make it double/triple allocation (one for the string data, one for the string, plus one for the control block if you don't use `make_shared`). But agreed, if you move the refcounting inside the class then you save an allocation, because it's the string data you count rather than the string. – Steve Jessop Nov 29 '12 at 10:00
  • Wouldn't it be more efficient to use a raw std::string pointer, provided that I will take care of the lifetime of the string (i.e. first delete the hashmap entry and only after that delete the Item object, containing the string). I really need efficiency here, I will handle hundreds of millions of items. – Alexander Vassilev Nov 29 '12 at 10:22
  • @AlexanderVassilev: it would work yes, however it's more difficult to get right, the `string` move during the `insert`... – Matthieu M. Nov 29 '12 at 11:05