0

I have a list of tuple list which have overlapping elements.

val tupLis:Seq[(List[(Integer,Char)],Int)] = null//data

I am trying to merge overlapping elements in the tuple list. Here is a code that i am working on which uses foldleft to merger overlapping tuple list from the list.The merging doesn't seam to be working as it misses out some elements of the tuple list.Each tuple list contains 4 tuples present in them. Each tuple list in the list often overlapping as they are generated from a bigger list using sliding function.

val alLis:Seq[(List[(Integer,Char)],Int)] = snGrMap.map(_._2).flatten.toList.sortBy(_._1.head._1)
val res = alLis.foldLeft(mutable.HashMap.empty[Int,(List[Integer],List[(Integer,Char)],Int)]) { (map, value) =>
  if(map.size<=0){
    map.put(0,(value._1.map(_._1),value._1,value._2))
  }else{
    val cads = map.filter(p=>value._1.intersect(p._2._2).size>=3)
    if(cads.size>=1) {
      cads.foreach { i =>
        val cmnPos = i._2._1.intersect(value._1.map(_._1))
        val cmnBase = i._2._2.filter(p=>cmnPos.contains(p._1)).intersect(value._1.filter(p=>cmnPos.contains(p._1)))
        println(cmnBase.size,cmnPos.size,value._1, i._2._2)
        if(cmnBase.size == cmnPos.size)
          map.put(i._1,((i._2._1++value._1.map(_._1)).distinct,(i._2._2++value._1).distinct,i._2._3+value._2))
        else
          map.put(map.size,(value._1.map(_._1),value._1,value._2))
      }
    }else{
      map.put(map.size,(value._1.map(_._1),value._1,value._2))
    }
  }
  map
}

here is the example data which i am using:

(List((306,c), (328,g), (336,a), (346,g)),282)
(List((306,g), (328,c), (336,g), (346,a)),22)
(List((306,c), (328,c), (336,g), (346,a)),4)
(List((328,g), (336,a), (346,g), (348,t)),164)
(List((328,g), (336,a), (346,g), (348,c)),161)
(List((328,c), (336,g), (346,a), (348,c)),28)
(List((336,a), (346,g), (348,t), (358,a)),168)
(List((336,a), (346,g), (348,c), (358,a)),154)
(List((336,g), (346,a), (348,c), (358,g)),30)
(List((346,g), (348,t), (358,a), (361,c)),178)
(List((346,g), (348,c), (358,a), (361,c)),166)
(List((346,a), (348,c), (358,g), (361,g)),34)

The merged list look like:

List((306,c), (328,g), (336,a), (346,g), (348,t), (358,a), (361,c),792)
List((306,c), (328,g), (336,a), (346,g), (348,c), (358,a), (361,c) ),763)
List((306,g), (328,c), (336,g), (346,a), (348,c),  (358,g), (361,g) ),96)

Update 1:

Overlap: If two list of tuples have the 3 or more exact tuples present in both list,then they are supposed to be overlapping list of tuples.But there should not be any difference when two list is merged.If one of the tuple values in both list have same integer but different char,then they shall not be merged. Merge: Combining two or more list of tuple list when they overlap.

Update 2: I have come up with a small solution,but not sure how efficient is it.

val alLisWithIndex = alLis.zipWithIndex
    val interGrps = new ListBuffer[(Int,Int)]()
    alLisWithIndex.foreach{i=>
      val cads = alLisWithIndex.filter(p=>p._1._1.take(3).intersect(i._1._1.takeRight(3)).size>=3)
      cads.foreach(p=>interGrps.append((i._2,p._2)))
    }
println(interGrps.sortBy(_._1))

so when i print the above code, i get tuple list grouped in this manner. i have printed only the index of the each tuple group that should be merged.

Result generated: ListBuffer((0,2), (0,3), (1,4), (2,5), (3,6), (4,7), (5,8), (6,9), (7,10))

here is the list of tuples with their index that was used

List(((List((306,c), (328,g), (336,a), (346,g)),282),0),
((List((306,g), (328,c), (336,g), (346,a)),22),1),
((List((328,g), (336,a), (346,g), (348,t)),164),2),
((List((328,g), (336,a), (346,g), (348,c)),161),3),
((List((328,c), (336,g), (346,a), (348,c)),28),4),
((List((336,a), (346,g), (348,t), (358,a)),168),5),
((List((336,a), (346,g), (348,c), (358,a)),154),6),
((List((336,g), (346,a), (348,c), (358,g)),30),7),
((List((346,g), (348,t), (358,a), (361,c)),178),8),
((List((346,g), (348,c), (358,a), (361,c)),166),9),
((List((346,a), (348,c), (358,g), (361,g)),34),10))

So now all i had to do was use the interGrps ,link the groups based on second value and finally replace the indexes with list of tuples..

Balaram26
  • 1,349
  • 3
  • 15
  • 32
  • The definition of your merged list (and what you mean by "overlap") is very unclear, especially as you don't give the totals. – The Archetypal Paul Jun 08 '16 at 05:45
  • Sorry, still unclear. Do they "overlap" when there is a difference in char with the same integer value? It's a bit confusing because you use "merge" there. Is it correct to say "two lists overlap if they share 3 or more tuples, and any pair of tuples with the same number do not have different characters?" – The Archetypal Paul Jun 08 '16 at 08:10
  • Yes,"two lists overlap if they share 3 or more tuples, and any pair of tuples with the same number do not have different characters" – Balaram26 Jun 08 '16 at 08:18
  • But that produces more output elements, not less, as now less things overlap. See my updated question - which still won't give the right answer – The Archetypal Paul Jun 08 '16 at 08:19
  • i have updated the question with answer i ended up with. – Balaram26 Jun 08 '16 at 08:28
  • What' the `take(3)` and `takeRight(3)` about? That's a very different version of overlap from your previous ones? And this `p._1._1.take(3).intersect(i._1._1.takeRight(3)).size>=3` can only be 3 if both lists are identical (since you're taking 3 elements each side of the intersect). And you have no check for same-number, different character - so if that was important, I think your current code works "by accident" on your current data set, rather than being an actual implementation of the algorithm you describe – The Archetypal Paul Jun 08 '16 at 08:30
  • As i mentioned in the question that this list of tuples was generated from sliding (length of 4)through a bigger list.so it means all the list of tuples will have only 4 tuples in them. And that's the reason that they overlap,so i check if they overlap by just taking the suffix of one tuple list against the prefix of all the tuple list. – Balaram26 Jun 08 '16 at 08:38
  • That's not the same as "sharing three or more tuples, according to that since they could share first, third and forth. I give up - more information about your data set is needed to get the right result, and obviously you have that, but I don't, based on the question. And you're still not checking the different character, so I don't know what all that was about. If you can get the defintion of `isOverlap` to match what you need, my code should still work. – The Archetypal Paul Jun 08 '16 at 08:41
  • Let's not, I have work to do. – The Archetypal Paul Jun 08 '16 at 08:46

1 Answers1

2

The following code follows, I think, the description of your algorithm. However, it does not give the same output so there's something still to clear up about wha you want

First, the test data

var xs = List(
(List((306,"c"), (328,"g"), (336,"a"), (346,"g")),282),
(List((306,"g"), (328,"c"), (336,"g"), (346,"a")),22),
(List((306,"c"), (328,"c"), (336,"g"), (346,"a")),4),
(List((328,"g"), (336,"a"), (346,"g"), (348,"t")),164),
(List((328,"g"), (336,"a"), (346,"g"), (348,"c")),161),
(List((328,"c"), (336,"g"), (346,"a"), (348,"c")),28),
(List((336,"a"), (346,"g"), (348,"t"), (358,"a")),168),
(List((336,"a"), (346,"g"), (348,"c"), (358,"a")),154),
(List((336,"g"), (346,"a"), (348,"c"), (358,"g")),30),
(List((346,"g"), (348,"t"), (358,"a"), (361,"c")),178),
(List((346,"g"), (348,"c"), (358,"a"), (361,"c")),166),
(List((346,"a"), (348,"c"), (358,"g"), (361,"g")),34))

Now a method to implement " If two list of tuples have the 3 or more exact tuples present in both list,then they are supposed to be overlapping list of tuples. "

def isOverlap[A](a:(List[A],Int),b:(List[A],Int)) = (a._1 intersect b._1).size >= 3

Then, using something I wrote over here that groups elements that "match" according to the predicate

def groupWith[A](xs: List[A], f: (A, A) => Boolean) = {
  // helper function to add "e" to any list with a member that matches the predicate
  // otherwise add it to a list of its own
  def addtoGroup(gs: List[List[A]], e: A): List[List[A]] = {
    val (before, after) = gs.span(_.exists(!f(_, e)))
    if (after.isEmpty)
      List(e) :: gs
    else
      before ::: (e :: after.head) :: after.tail
  }
  // now a simple foldLeft adding each element to the appropriate list
  xs.foldLeft(Nil: List[List[A]])(addtoGroup)
}  

we can get a list of lists of overlapping elements

List(List((List((346,g), (348,c), (358,a), (361,c)),166), 
          (List((346,g), (348,t), (358,a), (361,c)),178)),
     List((List((346,a), (348,c), (358,g), (361,g)),34),
          (List((336,g), (346,a), (348,c), (358,g)),30)), 
     List((List((336,a), (346,g), (348,c), (358,a)),154),
          (List((336,a), (346,g), (348,t), (358,a)),168)), 
     List((List((328,c), (336,g), (346,a), (348,c)),28),
          (List((306,c), (328,c), (336,g), (346,a)),4),
          (List((306,g), (328,c), (336,g), (346,a)),22)),
     List((List((328,g), (336,a), (346,g), (348,c)),161),
          (List((328,g), (336,a), (346,g), (348,t)),164),
          (List((306,c), (328,g), (336,a), (346,g)),282)))

Then we write a function to merge a list of overlapping tuples:

def merge(ys: List[(List[(Int, String)], Int)]) = 
   ys.foldLeft((Nil:List[(Int, String)], 0))
  {(acc, e) => ((acc._1 ++ (e._1 diff acc._1)).sorted, acc._2 + e._2)}

(doing a union of the tuples by adding any that aren't already in the accumulated result, and adding up the ints. The .sorted is just to make it easier to visually review the result)

Then merge the overlapping entries

ms.map(merge)

Giving this, but that isn't the output you have?

List((List((346,g), (348,c), (348, t), (358,a), (361,c)),344), 
     (List((336,g), (346,a), (348,c), (358,g), (361, g)),64),
     (List((336,a), (346,g), (348,c), (348,t), (358,a)),322),
     (List((306,c), (306,g), (328,c), (336,g), (346,a), (348,c)),54),
     (List((306,c), (328,g), (336,a), (346,g), (348,c), (348,t)),607))

EDIT: Following the comments, here's an updated isOverlap. However, it means less overlaps than the original, so more elements in the final merged output, so it's still not right:

def isOverlap(a:(List[(Int, String)],Int),b:(List[(Int, String)],Int)) =
  // combine the tuples by Int, and check that we don't get two entries
  // for any Int (i.e. if we do, they have different Strings so it's not an overlap)
 !((a._1++b._1).groupBy(_._2).exists(_._2.length > 1)) &&
  // check there are at least 2 matching tuples
  (a._1 intersect b._1).size >= 3  
Community
  • 1
  • 1
The Archetypal Paul
  • 41,321
  • 20
  • 104
  • 134
  • Thank you for the reply. I see the problem.The fault is on my side,sorry,as i missed one condition in the explanation. When i merge the list of tuples,they should have minimum 3 overlaps,but with no difference.(ie) in your output,the first list of tuple has '(348,c), (348,t)' .I shall update the question, – Balaram26 Jun 08 '16 at 07:52
  • Its okey,thank you.When two tuples are merged,they are two conditions- 1. They should have more than 3 intersections as per given in description. 2. If any tuple in both list has same integer but a different string/char as its value,then they cannot be grouped together. – Balaram26 Jun 08 '16 at 08:01
  • Oh, cool, so it's just a matter of changing isOverlap. However, that will give more overlapping groups, so a longer final merged list. But my code already produces a list longer than your output in the question, so that doesn't seem right. – The Archetypal Paul Jun 08 '16 at 08:02
  • I am not sure,because when i tested with the conditions i mentioned, i always ended up with less sequences than there should be. So I had come up different solution without making it soo complex. I have just posted it above. not sure how efficient it though. – Balaram26 Jun 08 '16 at 08:09
  • See my comment to your question. – The Archetypal Paul Jun 08 '16 at 08:10