0

I need to make an union of two unordered multisets passed from input, by using the "multiplicity" definition: The multiplicity of an element, also called absolute frequency, is the number of occurrences of the element 'x' in the unordered multiset 's'. In the multiSetUnion the multiplicity of an element is the Max of its multiplicity in the two multisets.

I already implemented correctly the int multiplicity(const Elem e, const MultiSet& s) function (returns number of occurrences in the multiset).

The multisets are singly linked lists.

Here is the algorithm I came up with:

for as long as the first list isn't empty
   if elem of the first list (multiset) is not in the second list (multiset)
      add elem in unionlist
   if elem of the first list (multiset) is in the second list (multiset)
      if multiplicity of elem is bigger in the first list than in the second one
         add elem in unionlist as many times as its multiplicity in list1
      if multiplicity of elem is bigger in the second list than in the first one
         add elem in unionlist as many times as its multiplicity in list2  
analyze the second element of the first list 

Here is my implementation of my algorithm, but it gives me errors when neither of the two lists are empty and I have no idea why:

MultiSet multiset::multiSetUnion(const MultiSet& s1, const MultiSet& s2)
{
    if (isEmpty(s1) && isEmpty(s2))
        return emptySet;
    if (isEmpty(s1) && !isEmpty(s2))
        return s2;
    if (!isEmpty(s1) && isEmpty(s2))
        return s1;
    MultiSet s3 = emptySet;    
    MultiSet aux2 = s2;            //THE FUNCTION DOESN'T WORK FROM HERE ON
    for (MultiSet aux1 = s1; !isEmpty(aux1); aux1 = aux1->next) { 
        if (!isIn(aux1->elem, aux2))
            insertElemS(aux1->elem, s3);
        if (isIn(aux1->elem, aux2)) {
            if (multiplicity(aux1->elem, aux1) > multiplicity(aux1->elem, aux2)) {
                for (int n = 0; n < multiplicity(aux1->elem, aux1); ++n)
                    insertElemS(aux1->elem, s3);
            }
            else {
                for (int m = 0; m < multiplicity(aux1->elem, aux2); ++m)
                    insertElemS(aux1->elem, s3);
            }
        }
    }
    return s3;
}

Could anybody please point out where am I doing wrong? Did I forget something in the algorithm or is this an implementation problem?

Edit: Here is how I have implemented the functions IsIn(const Elem x, MultiSet& s) and multiplicity(const Elem e, MultiSet& s):

bool isIn(const Elem x, MultiSet& s) {
    if (s->elem == x) return true;
    while (!isEmpty(s)) {
        if (s->elem!=x)
            s = s->next;
        else    return true;
    }
    return false;
}

int multiset::multiplicity(const Elem e, const MultiSet& s)
{
    if (isEmpty(s))    return 0;
    int count = 0;
    MultiSet aux = s;
    while (!isEmpty(aux)) {
        if (aux->elem==e) {
            ++count;
        }
        aux = aux->next;
    }
    return count;
}

Unfortunately I cannot use the vector library (or any STL library for the matter). The algorithm I proposed is the intentionally half of the solution (the part I'm having problems with). I am not getting any specific errors but the program simply stalls (it should instead print the first, the second and the union of the two multisets - the print function is correct and is called directly in the main; as for now I only get the correct output when one or both of the multisets is empty) and returns this: "Process returned -1073741819" (I am currently debugging on Windows).

timrau
  • 22,578
  • 4
  • 51
  • 64
Anna
  • 83
  • 1
  • 7
  • 1
    How would you calculate multiplicity? On the other hand, I'd go differently: Have a normal (non-multi) *map* (`std::map>`), iterate over both sets and increment the respective counter in the map for each element. Then you'd create a new set, iterate over the map and add as many objects as the maximum of the pair values (at the same time, you get an algorithm for intersection, just that you'd use minimum...). – Aconcagua Jun 14 '19 at 11:46
  • By the way: If you need *copies* of the sets inside anyway, why don't you just accept them by value? (But that gets obsolete if you opt for my proposed algorithm...) – Aconcagua Jun 14 '19 at 11:49
  • 2
    Please have a look at [ask] and [mcve]. Without knowing about how `isIn` or `multiplicity` are defined, there's little chance to guess what's going wrong. We even don't know what `->elem` results in... – Aconcagua Jun 14 '19 at 11:53
  • `for (;!isEmpty(aux1);) { }` – when would `aux1` actually get empty? I do not see that you remove anything from... – Aconcagua Jun 14 '19 at 11:56
  • 1
    @Aconcagua asked, "By the way: If you need *copies* of the sets inside anyway, why don't you just accept them by value?" In general, that is the right mode writing code, but the reason here is optimization. The OP is checking for empty conditions first and *then* making the copies. If one of them is not empty, but the other is, copying would be a waste and could be quite expensive. – metal Jun 14 '19 at 12:16
  • @metal Well, a copy of an empty map shouldn't come expensive, should it? – Aconcagua Jun 14 '19 at 12:28
  • @Aconcagua: Right, but the checking includes one empty and one not, and copying a non-empty collection when you don't need to could be bad for performance. Of course, as written, it's just returning the non-empty multiset, so if we can assume move support and RVO (not clear that we can here), it is probably a non-issue. – metal Jun 14 '19 at 12:35
  • @metal Hm, you should be right – RVO is out, as we'd need to construct the *parameter* at the location of the target object. But which one then??? Moving, though, should be possible, if implemented in the `MultiSet` class. – Aconcagua Jun 14 '19 at 12:42
  • 1
    What errors do you get? Blue Screen of Death? To make it easier/reasonable for someone else to debug *your* code, you should include the error message(s) in addition to the [mcve] requested by Aconcagua. – JaMiT Jun 14 '19 at 19:08
  • 1
    Is your algorithm supposed to be complete? Or is it intentionally half the solution to simplify things? I ask because your algorithm never tries to add elements from the second list to the union. (If the first list is empty, then the algorithm returns an empty list.) If it is intentionally simplified, you should acknowledge that in the question. – JaMiT Jun 14 '19 at 19:17
  • 1
    _Gratuitous tip:_ The three checks at the start of your function could be simplified to two: `if ( isEmpty(s1) ) return s2; if ( isEmpty(s2) ) return s1;` If both parameters are empty, then returning a copy of the second parameter is just as correct as returning a newly-constructed empty multiset, no? – JaMiT Jun 14 '19 at 19:20
  • @Aconcagua asked, "for (;!isEmpty(aux1);) { } – when would aux1 actually get empty? I do not see that you remove anything from... ": aux would be empty once I've reached the end of the first multiset (basically meaning "for as long as the first list isn't empty") - In your opinion this is what could be causing the malfunction? – Anna Jun 16 '19 at 09:35
  • 1
    @Anna Sorry, forget about that comment. The code you now additionally pasted makes clear that you implemented the set as linked list – actually, directly exposing the nodes of (questionable design...). But did you consider that, if you have a duplicate in the first set, you will iterate over that one twice (adding twice the multiplicity, too)? – Aconcagua Jun 16 '19 at 15:14

1 Answers1

3

Consider the following example:

MultiSet s1({7, 7});
MultiSet s2({5});

If you now iterate over s1:

1st iteration:        7    7
                      ^
                     aux1

2nd iteration:        7    7
                           ^
                          aux1

If you have multiple equal elements in s1, you will discover them more than once, finally resulting in adding the square of multiplicity (or product of both multiplicities, if the one of s2 is greater).

On the other hand, as 5 is not contained in s1, you won't try to look it up in s2 either – still, it is there...

To fix the first problem, you need to check, if the current element is already contained in s3 and if so, just skip it.

To fix the second problem, you need to iterate over s2, too, adding all those elements that are not yet contained in s3.

As is, the final result will be of pretty poor performance, though (should be somewhere in between O(n²) and O(n³), rather the latter). Unfortunately, you chose a data structure (a simple singly linked list – apparently unsorted!) that offers poor support for the operations you intend – and especially for the algorithm you chose.

If you kept your two lists sorted, you could create an algorithm with linear run-time. It would work similarly as the merging step in merge sort:

while(elements available in both lists):
    if(left element < right element):
        append left element to s3
        advance left
    else
        append right element to s3
        if(left element == right element):
            advance left // (too! -> avoid inserting sum of multiplicities)
        advance right
append all elements remaining in left
append all elements remaining in right
// actually, only in one of left and right, there can be elements left
// but you don't know in which one...

Keeping your list sorted during insertions is pretty simple:

while(current element < new element):
    advance
insert before current element // (or at end, if no current left any more)

However, as you expose the nodes of the list directly, you are always in danger that insertion will not start at the head element – and your sorting order might get broken.

You should encapsulate appropriately:

  • Rename your current MultiSet to e. g. 'Node' and create a new class MultiSet.
  • Make the node class a nested class of the new set class.
  • All modifiers of the list should be members of the set class only.
  • You might expose the node class, but user must not be able to modify it. It would then just serve as kind of iterator.
Aconcagua
  • 24,880
  • 4
  • 34
  • 59
  • Thank you for the crystal clear explanation, Since the "sorted-ness" of the structure wasn't specified, I tried your solution with the now-ordered multisets and it went smoothly. – Anna Jun 18 '19 at 20:47