0

I'm looking to implement a multi-map that maintains insertion order of the entries, and allows in-place insertion/replacement without affecting the order. Guava's LinkedListMultimap is almost perfect, but doesn't allow the type of replacement I'm looking for. LinkedListMultimap is implemented as a hash map and multiple linked lists; it looks like this:

                             ________________
                            /                \
(A,1) -> (B,2) -> (A,3) -> (C,4) -> (B,5) -> (C,6) -> (A,7)
 \________\_______/\________________/_________________/
           \_______________________/

Internally, every node has a pointer to the next node in the sequence, as well as the next node with the same key, and a hash table maintains a mapping from keys to the first node with that key.

Unfortunately, this doesn't allow for efficient in-place insertions or replacements. For example, to replace (C,4) with (B,8), I'd have to walk backwards an arbitrarily long way to find (B,2) in order to update its "next of same key" pointer.

The best idea I have to far is to associate each element with a sequence number, and keep a sorted set for each key. But to insert in the middle of the sequence, I would need infinitely divisible sequence numbers.

(By the way, I'm implementing this in C++, but I'm just looking for a description of a data structure that would work. If there's a pre-existing library that would work that would be great, but even boost::multi_index_container doesn't seem up to the task.)

Tavian Barnes
  • 12,477
  • 4
  • 45
  • 118

3 Answers3

0

Answer #1

Why is Boost.MultiIndex not helping you here?

Live On Coliru

#include <boost/multi_index_container.hpp>
#include <boost/multi_index/sequenced_index.hpp>
#include <boost/multi_index/ordered_index.hpp>
#include <boost/multi_index/composite_key.hpp>
#include <boost/multi_index/member.hpp>

using namespace boost::multi_index;

#include <iosfwd>

template<typename T,typename Q>
struct pair_
{
  T first;
  Q second;
};

template<typename T,typename Q>
std::ostream& operator<<(std::ostream& os,const pair_<T,Q>& p)
{
  return os<<"("<<p.first<<","<<p.second<<")";
}

template<typename T,typename Q>
using list_multimap=multi_index_container<
  pair_<T,Q>,
  indexed_by<
    sequenced<>,
    ordered_non_unique<
      composite_key<
        pair_<T,Q>,
        member<pair_<T,Q>,T,&pair_<T,Q>::first>,
        member<pair_<T,Q>,Q,&pair_<T,Q>::second>
      >
    >
  >
>;

template<typename T,typename Q>
std::ostream& operator<<(std::ostream& os,const list_multimap<T,Q>& lmm)
{
   for(const auto& p:lmm)os<<p<<" ";
   return os;
}

#include <string>
#include <iostream>

int main()
{
  list_multimap<std::string,int> lmm{{"A",1},{"B",2},{"A",3},{"C",4},{"B",5},{"C",6},{"A",7}};
  auto&                          mm=lmm.get<1>();

  std::cout<<lmm<<"\n";

  // List values with key "A"

  auto r=mm.equal_range("A");
  while(r.first!=r.second)std::cout<<*(r.first)++<<" ";
  std::cout<<"\n";

  // replace (C,4) with (B,8)

  mm.replace(mm.find(std::make_tuple("C",4)),{"B",8});
  std::cout<<lmm<<"\n";
}
Joaquín M López Muñoz
  • 5,243
  • 1
  • 15
  • 20
0

Answer #2

My first answer can be refined to get what you're after, I think:

Live On Coliru

#include <algorithm>
#include <boost/multi_index_container.hpp>
#include <boost/multi_index/random_access_index.hpp>
#include <boost/multi_index/ordered_index.hpp>
#include <boost/multi_index/identity.hpp>
#include <functional>

using namespace boost::multi_index;

#include <iosfwd>

template<typename T,typename Q>
struct pair_
{
  T first;
  Q second;

  using compare=std::function<bool(const pair_&,const pair_&)>;
  mutable compare* subcmp;

  pair_(const T& first,const Q& second,compare* subcmp=nullptr):
    first(first),second(second),subcmp(subcmp){}
};

namespace std{

template<typename T,typename Q>
struct less<pair_<T,Q>>
{
  bool operator()(const pair_<T,Q>& x,const pair_<T,Q>& y)const
  {
     if(x.first<y.first)return true;
     if(y.first<x.first)return false;
     if(x.subcmp)       return (*x.subcmp)(x,y);
     if(y.subcmp)       return (*y.subcmp)(x,y);
     return false;
  }

  template<typename R>
  bool operator()(const R& x,const pair_<T,Q>& y)const
  {
     return x<y.first;
  }

  template<typename R>
  bool operator()(const pair_<T,Q>& x,const R& y)const
  {
     return x.first<y;
  }
};

} // namespace std

template<typename T,typename Q>
std::ostream& operator<<(std::ostream& os,const pair_<T,Q>& p)
{
  return os<<"("<<p.first<<","<<p.second<<")";
}

template<typename T,typename Q>
using list_multimap=multi_index_container<
  pair_<T,Q>,
  indexed_by<
    random_access<>,
    ordered_non_unique<identity<pair_<T,Q>>>
  >
>;

template<typename T,typename Q>
std::ostream& operator<<(std::ostream& os,const list_multimap<T,Q>& lmm)
{
   for(const auto& p:lmm)os<<p<<" ";
   return os;
}

#include <string>
#include <iostream>

int main()
{
  list_multimap<std::string,int> lmm{{"A",1},{"B",2},{"A",3},{"C",4},{"B",5},{"C",6},{"A",7}};
  auto&                          mm=lmm.get<1>();

  std::cout<<lmm<<"\n";

  // list values with key "A"

  auto r=mm.equal_range("A");
  while(r.first!=r.second)std::cout<<*(r.first)++<<" ";
  std::cout<<"\n";

  // replace (C,4) with (B,8)

  pair_<std::string,int>::compare subcmp=[&](const auto&x, const auto& y){
    auto itx=lmm.iterator_to(x);
    auto ity=lmm.iterator_to(y);
    return itx<ity;
  };

  r=mm.equal_range("C");
  auto it=std::find_if(r.first,r.second,[](const auto& x){return x.second==4;});
  mm.modify(it,[&](auto&x){x={"B",8,&subcmp};});
  it->subcmp=nullptr;
  std::cout<<lmm<<"\n";

  // list values with key "B"

  r=mm.equal_range("B");
  while(r.first!=r.second)std::cout<<*(r.first)++<<" ";
  std::cout<<"\n";  
}

The key ideas are:

  • Use a random-access index instead of a sequenced one.
  • Let elements be subsorted (when the keys are equal) by a user-provided comparison function, stored in subcmp, which is optional (if subcmp is null).
  • When replacing values, use modify (so as to change the element in place) and provide a subcomparer that simply respects the order of the elements in the random-access index. After modification is done, set subcmp to nullptr as it is no longer needed.
Joaquín M López Muñoz
  • 5,243
  • 1
  • 15
  • 20
0

Answer #3

My second answer can be further refined to place the subcomparer within the less<pair_<T,Q>> object itself:

Live On Coliru

#include <algorithm>
#include <boost/multi_index_container.hpp>
#include <boost/multi_index/random_access_index.hpp>
#include <boost/multi_index/ordered_index.hpp>
#include <boost/multi_index/identity.hpp>
#include <functional>

using namespace boost::multi_index;

#include <iosfwd>

template<typename T,typename Q>
struct pair_
{
  T first;
  Q second;
};

namespace std{

template<typename T,typename Q>
struct less<pair_<T,Q>>
{
  using subcompare=std::function<bool(const pair_<T,Q>&,const pair_<T,Q>&)>;
  subcompare subcmp;

  bool operator()(const pair_<T,Q>& x,const pair_<T,Q>& y)const
  {
     if(x.first<y.first)return true;
     if(y.first<x.first)return false;
     if(subcmp)         return subcmp(x,y);
     return false;
  }

  template<typename R>
  bool operator()(const R& x,const pair_<T,Q>& y)const
  {
     return x<y.first;
  }

  template<typename R>
  bool operator()(const pair_<T,Q>& x,const R& y)const
  {
     return x.first<y;
  }
};

} // namespace std

template<typename T,typename Q>
std::ostream& operator<<(std::ostream& os,const pair_<T,Q>& p)
{
  return os<<"("<<p.first<<","<<p.second<<")";
}

template<typename T,typename Q>
using list_multimap=multi_index_container<
  pair_<T,Q>,
  indexed_by<
    random_access<>,
    ordered_non_unique<
      identity<pair_<T,Q>>,
      std::reference_wrapper<const std::less<pair_<T,Q>>>>
  >
>;

template<typename T,typename Q>
std::ostream& operator<<(std::ostream& os,const list_multimap<T,Q>& lmm)
{
   for(const auto& p:lmm)os<<p<<" ";
   return os;
}

#include <string>
#include <iostream>

int main()
{
  std::less<pair_<std::string,int>> less;
  list_multimap<std::string,int>    lmm{boost::make_tuple(
                                      boost::make_tuple(),
                                      boost::make_tuple(
                                        identity<pair_<std::string,int>>{},
                                        std::cref(less)
                                      )
                                    )};
  auto&                             mm=lmm.get<1>();

  lmm={{"A",1},{"B",2},{"A",3},{"C",4},{"B",5},{"C",6},{"A",7}};
  std::cout<<lmm<<"\n";

  // list values with key "A"

  auto r=mm.equal_range("A");
  std::for_each(r.first,r.second,[](const auto& x){std::cout<<x<<" ";});
  std::cout<<"\n";

  // replace (C,4) with (B,8)

  std::less<pair_<std::string,int>>::subcompare subcmp=
  [&](const auto&x, const auto& y){
    return lmm.iterator_to(x)<lmm.iterator_to(y);
  };

  r=mm.equal_range("C");
  auto it=std::find_if(r.first,r.second,[](const auto& x){return x.second==4;});
  less.subcmp=subcmp;
  mm.modify(it,[](auto& x){x={"B",8};});
  less.subcmp=nullptr;
  std::cout<<lmm<<"\n";

  // list values with key "B"

  r=mm.equal_range("B");
  std::for_each(r.first,r.second,[](const auto& x){std::cout<<x<<" ";});
  std::cout<<"\n";  
}

This gets us an important reduction in memory usage, as the elements themselves need not come with an additional pointer for subcmp. The general strategy remains exactly the same.

Joaquín M López Muñoz
  • 5,243
  • 1
  • 15
  • 20
  • I see what you've done, very nice! Sadly insertion into the middle of a `random_access` index is `O(n)`, but since it doesn't involve copies it's probably fast in practice for my case. I am curious if it can be done in sublinear time though. – Tavian Barnes Feb 28 '15 at 16:38
  • 1
    (Off Boost.MultiIndex realm.) To do sublinear replacement you need to know, given two elements `x` and `y`, which comes first in the sequence. A vector-like sequence gives you that in constant time but, as you point out, mid insertion is O(n). An alternative would be to use an order statistics tree (http://en.wikipedia.org/wiki/Order_statistic_tree) as your base sequence: relative ordering check can be done in O(log n), as well as mid insertion. – Joaquín M López Muñoz Feb 28 '15 at 17:31
  • Ah an order statistic tree is perfect! Can't believe I didn't think of it myself actually. If you put that as an answer I'll gladly accept it. – Tavian Barnes Feb 28 '15 at 17:40
  • You will have to measure, but I bet the vector-like sequence as sported in my example with Boost.MultiIndex will beat the order statistic tree in practice. Please let me know if you do the exercise. – Joaquín M López Muñoz Feb 28 '15 at 17:44