2

While doing a college project I came upon the following problem: I have two maps (Kmer1 and Kmer2) which are composed by a string(key) and a int(value). I have to calculate the distance which follows this formula

[1-(I/U)]*100

Where...
     ...U = the sum of all int values inside Kmer1 U Kmer2
     ...I = the sum of all int values inside Kmer1 ∩ Kmer2

Consider that...
             ... The U and ∩ are made evaluating the keys (strings)
             ... When an element is in both maps:
                 - At the Union we add the one with higher int value
                 - At the Intersection we add the one with lower int value

Example:

Kmer1 = AAB¹ AAC¹ AAG³
Kmer2 = AAG¹ AAT² ABB¹

Union = AAB¹ AAC¹ AAG³ AAT² ABB¹   U= 8
Intersection = AAG¹                I= 1
Distance = 87.5

Code time! I've been trying to solve it but all solutions are like.. partially right, not all cases are covered. So when I tried to cover them, I ended in infinite loops, exceptions rising, long long nests of if-else (which were awfull..) anyway, here is the the least worse and non working try:

The setup:

Species::Kmer Kmer1, Kmer2;        //The two following lines get the Kmer from another
Kmer1 = esp1->second.query_kmer(); //object.
Kmer2 = esp2->second.query_kmer(); 

Species::Kmer::const_iterator it1, it2, last1, last2;
it1 = Kmer1.cbegin();           //Both Kmer are maps, therefore they are ordered and
it2 = Kmer2.cbegin();           //whitout duplicates.
last1 = --Kmer1.cend();
last2 = --Kmer2.cend();

double U, I;
U = I = 0;

The loop where the formula is applied:

while (it1 != Kmer1.cend() and it2 != Kmer2.cend()){
    if (it1->first == it2->first) {         
        if (it1->second > it2->second) {
            U += it1->second;
            I += it2->second;
        } else {
            U += it2->second;
            I += it1->second;
        }
        ++it1;
        ++it2;

    } else if (it1->first < it2->first) {
        U += it1->second;
        ++it1;
    } else {
        U += it2->second;
        ++it2;
    }
}

Note that instead of first creating the Union and the intersection and then doing the total sum of each, I jumped directly to the sum of the values. I know maybe it's not that hard but I've been trying to solve it but I'm pretty much stuck...


I've uploaded the whole code at Github: (Maybe it helps)
    - There is a makefile to build the code
    - There is a file called input.txt with a sample for this specific problem
    - Also inside the input.txt, after line13 (fin) I've added the expected output
    - Executing ./program.exe < input.txt should be enough to test it.

https://github.com/PauGalopa/Cpp-Micro-Projects/tree/master/Release


IMPORTANT Yes! I'm aware of almost all the STL functionality that could do this in a few lines BUT... Since this is a college project i'm bound to the limitations of the sillabus so consider that I'm only allowed to use "map" "string" "vector" and a few more. No, I can't use "algorithm" (I really wish I could) I'll clarify any doubt about which things I can do or use in the coments.

Pauete Galopa
  • 153
  • 1
  • 1
  • 9
  • I don't think you handle correctly the case when one iterator reaches the end and the other one does not in the while loop. When the loop finishes there might be still some elements left in either of the maps which could be included in `U`. – ciamej May 06 '20 at 12:11
  • Indeed, I've thought of evaluating """it1 != last1 and it2!= last2""" in the while loop, then resolving the remaining elements, but this depends on which ended first – Pauete Galopa May 06 '20 at 12:13
  • What is your problem with current code? Does it bring wrong results? Does it crash? – ciamej May 06 '20 at 12:14
  • I will edit the loop as I find errors, but for now it just give wrong results – Pauete Galopa May 06 '20 at 12:17

4 Answers4

2

Add these two loops just after your main while loop.

while (it1 != Kmer1.cend()){
    U += it1->second;
    it1++;
}
while (it2 != Kmer2.cend()){
    U += it2->second;
    it2++;
}
ciamej
  • 6,918
  • 2
  • 29
  • 39
  • I've considered that before. In the case where the last element of Kmer1 is equal to the last element of Kmer2 (or viceversa) the last element won't be compared and therefore it won't be valid – Pauete Galopa May 06 '20 at 12:20
  • @PaueteGalopa I really can't see this happening with how your loop is designed right now. Can you give an example? – ciamej May 06 '20 at 12:25
  • 1
    Sorry for doubting, it actually worked! I can't belive it was that simple. I've tried that before but now that I see the code I used previously I see why didn't worked. I guess I was so ofuscated that I did't saw it... Thanks again – Pauete Galopa May 06 '20 at 12:34
2

Here is a rather simple solution, using only some propoerties of std::map, with no iterator. I hope you are allowed to use this kind of solution.

#include <iostream>
#include <map>
#include <string>

int main () {
    std::map <std::string, int> A = {{"AAB", 1}, {"AAC", 1}, {"AAG", 3}};
    std::map <std::string, int> B = {{"AAG", 1}, {"AAT", 2}, {"ABB", 1}};

    std::map <std::string, int> Union;
    int sum_A = 0, sum_B = 0, sum_Union = 0, sum_Inter = 0;;

    for (auto &x: A) {
        Union[x.first] = std::max (Union[x.first], x.second);
        sum_A += x.second;
    }
    for (auto &x: B) {
        Union[x.first] = std::max (Union[x.first], x.second);
        sum_B += x.second;
    }   
    for (auto &x: Union) {
        sum_Union += x.second;
    }
    sum_Inter = sum_A + sum_B - sum_Union;
    double distance = 100.0 * (1.0 - double(sum_Inter)/sum_Union);

    std::cout << "sum_Union = " << sum_Union << " sum_Inter = " << sum_Inter << "\n";
    std::cout << "Distance = " << distance << "\n";
}
Damien
  • 4,809
  • 4
  • 15
  • 20
  • Requires extra map though., and is O(n log n) whereas OP has O(n) solution – Jarod42 May 06 '20 at 15:09
  • @Jarod42 Effectively. It should stay efficient except for rather large n values. Taking into account the context (college), I tried to find the simplest solution – Damien May 06 '20 at 15:14
1

A slightly cleaner approach for unordered_mapping, but which would still work with mapping would be to add all the elements of Kmer1 to U, and shared elements to I. Then add all the unshared elements of Kmer2 to U:

for(it1 = Kmer1.cbegin(); it1 != Kmer1.cend(); it1++) {
    auto other = Kmer2.find(it1->first);
    if(other == Kmer2.cend()) {
        U += it1->second;
    } else {
        U += max(it1->second, other->second);
        I += min(it1->second, other->second);
    }
}
for(it2 = Kmer2.cbegin(); it2 != Kmer2.cend(); it2++) {
    if(Kmer1.count(it2->first) == 0) {
        U += it2->second
    }
}

For a properly implemented unordered_mapping (hash table), the find operation will be O(1), not O(log(n), making it a bit faster.

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
  • Does your code take into account the requirement that when a key is present in both the maps, the higher value is added to `U`? – ciamej May 06 '20 at 12:36
  • 1
    @ciamej. Very clearly, yes. The `else` in the first loop is your region of interest. – Mad Physicist May 06 '20 at 12:37
  • 1
    @PaueteGalopa it's whatever type you need to make the expression work, to be determined by the compiler. Basically, a lazy hack on my part. Feel free to replace it with the correct iterator type. – Mad Physicist May 06 '20 at 12:39
  • Ah, ok, now I see it ;) – ciamej May 06 '20 at 12:40
  • Shouldn't it be `U += it2->second` in the second loop though? – ciamej May 06 '20 at 12:40
  • @ciamej. Yes it should. Thanks for the catch. Fixed – Mad Physicist May 06 '20 at 12:41
  • 1
    @PaueteGalopa. This site is excellent: https://en.cppreference.com/w/cpp/language/auto, and so is http://www.cplusplus.com – Mad Physicist May 06 '20 at 12:44
  • @MadPhysicist This is far better than what I was asking for! I just comlicated my self in my aproach. Also, when I try to implement it max and min are highlighted by de IDE saying: no instance of "min" matches the argument list -- argument types are: const int, std::_Rb_tree_iterator> – Pauete Galopa May 06 '20 at 12:45
  • @PaueteGalopa. Your approach is actually better algorithmically since it relies on the sortedness of the data. The second argument should have been `other->second`, not `other`. Fixed now. – Mad Physicist May 06 '20 at 12:48
  • @PaueteGalopa it all depends whether your data need to be sorted or not. If not really, then you can redefine `Kmer` from `map` to `unordered_map` and use Mad Physicist's solution. – ciamej May 06 '20 at 12:52
  • Actually Kmer is the only type which doesn't need to be ordered I think! I wanted to ask... (but maybe the answer it's to big for the comment section, maybe a reference to a site?) How can I determine the complexity of a piece of code?? – Pauete Galopa May 06 '20 at 12:58
  • @PaueteGalopa. Practically, a good way is to run it with different large inputs. Theoretically, you would analyze the code. Things like mappings are pretty well analyzed. Ordered mappings usually have log(N) lookup and insertion. Unordered usually have O(1) lookup and insertion. – Mad Physicist May 06 '20 at 14:34
1

This loop should work:

while ( true ){
    bool end1 = it1 == Kmer1.cend();
    bool end2 = it2 == Kmer2.cend();
    if( end1 and end2 )
        break;

    if( end2 or it1->first < it2->first ) {
        U += (it1++)->second;
        continue;
    }
    if( end1 or it2->first < it1->first ) {
        U += (it2++)->second;
        continue;
    }
    auto p = std::minmax( (it1++)->second, (it2++)->second );
    I += p.first;
    U += p.second;
}
Slava
  • 43,454
  • 1
  • 47
  • 90