0

I need to output a group of letters that are that are out of order with respect to the number of inversions of each other.

For example, the sequence “AACEDGG” has only 1 inversion (E and D) while the sequence “ZWQM” has 6 inversions. I don't actually have to sort it out but I have to output them based on the number of inversions they have.

Ex:

Input: AACATGAAGG TTTTGGCCAA TTTGGCCAAA GATCAGATTT CCCGGGGGGA ATCGATGCAT

Output: CCCGGGGGGA AACATGAAGG GATCAGATTT ATCGATGCAT TTTTGGCCAA TTTGGCCAAA

I am trying to use insertion sort as a template as required by my teacher.

void inversionChecker(string dna[], int n)
{
    int j,k,m;
    int tempCount;
    for(int i=0; i < n; i++){
        int count=0;
        for(j=0;j < n; j++){
            for(k=j+1; k <= n; k++){
                if(dna[i][j] > dna[i][k]){
                    count++;
                    tempCount = count;
                }
            }
        }
        if(i != 0 && tempCount > count)
            dna[i].swap(dna[i-1]);
    }
}

I am having issues because I am not too familiar using 2D arrays to compare the letters in each string. When I try to output the array it ends up being blank, seg faults, or errors resulting from my use trying to swap the positions of the strings in the array.

Any help would be appreciated

Ted Lyngmo
  • 93,841
  • 5
  • 60
  • 108
101001
  • 113
  • 9
  • 1
    There are various issues in the code. `tempCount` seems to always be 0. And the swapping is not surrounded by `{}`. I would suggest making a map of each string to the inversion count, and then using that to do the sort. – cigien Jul 22 '20 at 23:11
  • What does `n` represent? You seem to be using it for both the length of the array `dna` _and_ the length of the strings in that array. The last iteration of the k-loop will use the index `n`, which looks like it could potentially be out of bounds. You never initialize `tempCount`. Maybe choose a better name for that variable. "Temp" implies _temporary_, which is odd because its lifetime is longer than that of `count`. – paddy Jul 23 '20 at 01:52
  • If I understand this correctly, `AACATGAAGG` and `GATCAGATTT` both have 10 inversions, right? Can they be printed in any order or what determines that `AACATGAAGG` should be printed before `GATCAGATTT`? – Ted Lyngmo Jul 23 '20 at 07:27

1 Answers1

1

Here you access the dna array out-of-bounds:

for(j=0;j < n; j++){
    for(k=j+1; k <= n; k++){        // when k == n you have undefined behavior
        if(dna[i][j] > dna[i][k])

it should be:

for(j=0;j < n-1; j++){
    for(k=j+1; k < n; k++){
        if(dna[i][j] > dna[i][k])

An alternative approach using misc. standard classes and algorithms, like std::vector and std::sort.

#include <algorithm> // copy, sort
#include <cstddef>   // size_t
#include <iterator>  // istream_iterator, back_inserter
#include <sstream>   // istringstream
#include <string>    // string
#include <tuple>     // tie
#include <utility>   // swap
#include <vector>    // vector

#include <iostream>

// count inversions in a sequence
unsigned count_inversions(std::string sequence) {
    unsigned res = 0;

    // assuming "inversions" are defined as the number of swaps needed in bubblesort
    for(size_t i = 0; i < sequence.size() - 1; ++i) {
        for(size_t j = i + 1; j < sequence.size(); ++j) {
            if(sequence[j] < sequence[i]) {
                std::swap(sequence[i], sequence[j]);
                ++res;
            }
        }
    }
    return res;
}

// a class to store a sequence and its inversion count
struct sequence_t {
    sequence_t() = default;

    explicit sequence_t(const std::string& Seq) :
        seq(Seq), inversions(count_inversions(seq)) {}

    // "less than" operator to compare two "sequence_t"s (used in std::sort)
    bool operator<(const sequence_t& rhs) const {
        // assuming lexicographical order if inversions are equal
        return std::tie(inversions, seq) < std::tie(rhs.inversions, rhs.seq);
    }

    std::string seq;
    unsigned inversions;
};

// read one sequence_t from an istream
std::istream& operator>>(std::istream& is, sequence_t& s) {
    std::string tmp;
    if(is >> tmp) s = sequence_t(tmp);
    return is;
}

// read "sequence_t"s from an istream and put in a vector<sequence_t>
auto read_sequences(std::istream& is) {
    std::vector<sequence_t> rv;

    std::copy(std::istream_iterator<sequence_t>(is),
              std::istream_iterator<sequence_t>{}, std::back_inserter(rv));
    return rv;
}

int main() {
    std::istringstream input(
        "AACATGAAGG TTTTGGCCAA TTTGGCCAAA GATCAGATTT CCCGGGGGGA ATCGATGCAT");

    auto sequences = read_sequences(input);

    std::sort(sequences.begin(), sequences.end());

    // print result
    for(const auto& [seq, inversions] : sequences) {
        std::cout << seq << '(' << inversions << ')' << ' ';
    }
    std::cout << '\n';
}

Output (including the inversions):

CCCGGGGGGA(2) AACATGAAGG(10) GATCAGATTT(10) ATCGATGCAT(11) TTTTGGCCAA(12) TTTGGCCAAA(15)
Ted Lyngmo
  • 93,841
  • 5
  • 60
  • 108