Scala - get unique values from List with a twist

Question

I have a list like this:

val l= List(("Agent", "PASS"), ("Agent", "FAIL"), ("Agent 1", "FAIL"), ("Agent", "PASS"), ("Agent 2", "PASS") )

and I need to end up with a list like this:

val filteredList= List(("Agent", "FAIL"), ("Agent 1", "FAIL"), ("Agent 2", "PASS") )

What happened?

("Agent", "PASS"), ("Agent", "FAIL")

becomes

("Agent", "FAIL")

(because if there is at least one FAIL, I need to keep that entry)

the entries for Agent 1 and Agent 2 stay the same because there are just one entry for each.

The closest answer I found is How in Scala to find unique items in List but I cannot tell how to keep the entries with FAIL.

I hope the question is clear, if not, I can give you a better example.

Thanks

Kevin Wright · Answer 1 · 2010-12-16T09:42:47.157

Preamble

It occurred to me that the status could be seen as having a priority, and if given a sequence of (agent,status) pairs then the task is to select only the highest priority status for each agent. Unfortunately, status isn't strongly typed with an explicit ordering so defined, but... as it's a string with only two values we can safely use string ordering as having a 1:1 correspondence to the priority.

Both my answers take advantage of two useful facts:

In natural string ordering, "FAIL" < "PASS", so:

List("PASS", "FAIL", "PASS").sorted.head = "FAIL"

For two tuples (x,a) and (x,b), (x,a) > (x, b) if (a > b)

UPDATED REPLY

val solution = l.sorted.reverse.toMap

When converting a Seq[(A,B)] to a Map[A,B] via the .toMap method, each "key" in the original sequence of tuples can only appear in the resulting Map once. As it happens, the conversion uses the last such occurrence.

l.sorted.reverse = List(
  (Agent 2,PASS),  // <-- Last "Agent 2"
  (Agent 1,FAIL),  // <-- Last "Agent 1"
  (Agent,PASS),
  (Agent,PASS),
  (Agent,FAIL))    // <-- Last "Agent"

l.sorted.reverse.toMap = Map(
  Agent 2 -> PASS,
  Agent 1 -> FAIL,
  Agent -> FAIL)

ORIGINAL REPLY

Starting with the answer...

val oldSolution = (l groupBy (_._1)) mapValues {_.sorted.head._2}

...and then showing my working :)

//group
l groupBy (_._1) = Map(
  Agent 2 -> List((Agent 2,PASS)),
  Agent 1 -> List((Agent 1,FAIL)),
  Agent -> List((Agent,PASS), (Agent,FAIL), (Agent,PASS))
)

//extract values
(l groupBy (_._1)) mapValues {_.map(_._2)} = Map(
  Agent 2 -> List(PASS),
  Agent 1 -> List(FAIL),
  Agent -> List(PASS, FAIL, PASS))

//sort
(l groupBy (_._1)) mapValues {_.map(_._2).sorted} = Map(
  Agent 2 -> List(PASS),
  Agent 1 -> List(FAIL),
  Agent -> List(FAIL, PASS, PASS))

//head
(l groupBy (_._1)) mapValues {_.map(_._2).sorted.head} = Map(
  Agent 2 -> PASS,
  Agent 1 -> FAIL,
  Agent -> FAIL)

However, you can directly sort the agent -> status pairs without needing to first extract _2:

//group & sort
(l groupBy (_._1)) mapValues {_.sorted} = Map(
  Agent 2 -> List((Agent 2,PASS)),
  Agent 1 -> List((Agent 1,FAIL)),
  Agent -> List((Agent,FAIL), (Agent,PASS), (Agent,PASS)))

//extract values
(l groupBy (_._1)) mapValues {_.sorted.head._2} = Map(
  Agent 2 -> PASS,
  Agent 1 -> FAIL,
  Agent -> FAIL)

In either case, feel free to convert back to a List of Pairs if you wish:

l.sorted.reverse.toMap.toList = List(
  (Agent 2, PASS),
  (Agent 1, FAIL),
  (Agent, FAIL))

The `l.sorted.reverse.toMap` is pretty slick, though I'm sure I'd never remember why that worked in the future. ;-) — Steve, Dec 15 '10 at 14:28
Sorting is O(nlogn) (best case) though, while this problem can clearly be solved in no more than O(n). Wouldn't it be better to just use `max` or `min`, which also depend on ordering but not on sorting? — Daniel C. Sobral, Dec 16 '10 at 13:41
@Daniel Oh, absolutely if the list was big enough that this impacted performance. I just couldn't resist the elegance of solving it in four tokens :) — Kevin Wright, Dec 16 '10 at 14:17

Synesso · Accepted Answer · 2010-12-15T06:14:28.650

Is this what you want?

jem@Respect:~$ scala
Welcome to Scala version 2.8.0.final (Java HotSpot(TM) Client VM, Java 1.6.0_21).
Type in expressions to have them evaluated.
Type :help for more information.

scala> val l= List(("Agent", "PASS"), ("Agent", "FAIL"), ("Agent 1", "FAIL"), ("Agent", "PASS"), ("Agent 2", "PASS") )
l: List[(java.lang.String, java.lang.String)] = List((Agent,PASS), (Agent,FAIL), (Agent 1,FAIL), (Agent,PASS), (Agent 2,PASS))

scala> l.foldLeft(Map.empty[String, String]){(map,next) =>
     |   val (agent, result) = next
     |   if ("FAIL" == result) map.updated(agent, result)
     |   else {           
     |     val maybeExistingResult = map.get(agent)
     |     if (maybeExistingResult.map(_ == "FAIL").getOrElse(false)) map
     |     else map.updated(agent, result)
     |   }
     | }
res0: scala.collection.immutable.Map[String,String] = Map((Agent,FAIL), (Agent 1,FAIL), (Agent 2,PASS))

scala> res0.toList
res1: List[(String, String)] = List((Agent 2,PASS), (Agent 1,FAIL), (Agent,FAIL))

Or here is a shorter and more obscure solution:

scala> l.groupBy(_._1).map(pair => (pair._1, pair._2.reduceLeft((a,b) => if ("FAIL" == a._2 || "FAIL" == b._2) (a._1, "FAIL") else a))).map(_._2).toList
res2: List[(java.lang.String, java.lang.String)] = List((Agent 2,PASS), (Agent 1,FAIL), (Agent,FAIL))

Wow... In a rare twist, that actually manages to be more verbose and less comprehensible than equivalent imperative Java code. — Kevin Wright, Dec 15 '10 at 11:51
Assuming the second String value is limited to "PASS" or "FAIL", then the type should be Boolean. That would go a long way to simplifying the solution. — Synesso, Dec 15 '10 at 22:03
If there's any possibility of other statuses, then the status field should be a sealed trait or an enum. Failing that, I agree 100% that Boolean is the best approach. — Kevin Wright, Dec 16 '10 at 09:41

Daniel C. Sobral · Answer 3 · 2010-12-16T13:52:27.397

4

Plenty of good solutions, but here is mine anyway. :-)

l
.groupBy(_._1) // group by key
.map { 
    case (key, list) => 
        if (list.exists(_._2 == "FAIL")) (key, "FAIL") 
        else (key, "PASS")
}

Here's another I just had at a sudden epiphany:

def booleanToString(b: Boolean) = if (b) "PASS" else "FAIL"
l
.groupBy(_._1)
.map {
    case (key, list) => key -> booleanToString(list.forall(_._2 == "PASS"))
}

edited Dec 16 '10 at 13:52

answered Dec 15 '10 at 15:18

Daniel C. Sobral

295,120
86
501
681

It's the second cleanest solution I've seen so far, but can be made a bit smaller by using `find` to remove the duplication of "FAIL": `l groupBy (_._1) map {case (k,xs) => xs.find(_._2 == "FAIL").getOrElse(k->"PASS")}`. Interestingly, this is the chain of thinking that led me to the "eureka" moment once I realised that "PASS" and "FAIL" are both Strings, and so sortable. – Kevin Wright Dec 15 '10 at 16:32
3

@Kevin I don't like the sorting solution much because it works by coincidence, not because the ordering of PASS and FAIl have anything to do with what the code is supposed to do. – Daniel C. Sobral Dec 15 '10 at 18:09
In a more rigorous solution, PASS and FAIL would be subclasses of a sealed trait (e.g `Status`), or members of an enum. In which case, the problem is to filter a seq of `String -> Status` so as to find the highest priority status for each agent string, ordering of Status objects would then be explicitly defined in this priority order. It's just lucky coincidence that the string representations "PASS" and "FAIL" happen to also possess this same ordering. – Kevin Wright Dec 15 '10 at 18:52
Incidentally, if there had been more than two strings, I would have explicitly defined a dedicated function to use in `.sortBy()` instead of using `.sorted`... The solution would still be elegant. – Kevin Wright Dec 15 '10 at 18:57

score 2 · Answer 4 · answered Dec 15 '10 at 07:03

Here is my take. First a functional solution:

l.map(_._1).toSet.map({n:String=>(n, if(l contains (n,"FAIL")) "FAIL" else "PASS")})

First we isolate the names, uniquely (toSet), then we map each name to a tuple with itself as first element, and either "FAIL" as second element if a fail is contained in l, or otherwise it must obviously be a "PASS".

The result is a set. Of course you can do toList at the end of the call chain if you really need a list.

Here is an imperative solution:

var l = List(("Agent", "PASS"), ("Agent", "FAIL"), ("Agent 1", "FAIL"), ("Agent", "PASS"), ("Agent 2", "PASS"))
l.foreach(t=>if(t._2=="FAIL") l=l.filterNot(_ == (t._1,"PASS")))
l=l.toSet.toList

I don't like it as much because it is imperative, but hey. In some sense, it reflects better what you would actually do when you'd solve this by hand. For each "FAIL" you see, you remove all corresponding "PASS"es. After that, you ensure uniqueness (.toSet.toList).

Note that l is a var in the imperative solution, which is necessary because it gets reassigned.

score 1 · Answer 5 · edited May 23 '17 at 12:09

1

Look at Aggregate list values in Scala

In your case you'd group by Agent and aggregate by folding PASS+PASS=>PASS and ANY+FAIL=>FAIL.

edited May 23 '17 at 12:09

Community

1
1

answered Dec 15 '10 at 05:42

Ben Jackson

90,079
9
98
150

score 1 · Answer 6 · answered Dec 15 '10 at 08:09

Perhaps more efficient to group first, then find the disjuction of PASS/FAIL:

l.filter(_._2 == "PASS").toSet -- l.filter(_._2 == "FAIL").map(x => (x._1, "PASS"))

This is based on your output of ("Agent", "PASS") but if you just want the agents:

l.filter(_._2 == "PASS").map(x => x._1).toSet -- l.filter(_._2 == "FAIL").map(x => x._1)

Somehow I expected that second one to be shorter.

score 1 · Answer 7 · edited May 23 '17 at 12:18

So as I understand it, you want to:

Group the tuples by their first entry ("key")
For each key, check all tuple second entries for the value "FAIL"
Produce (key, "FAIL") if you find "FAIL" or (key, "PASS") otherwise

Since I still find foldLeft, reduceLeft, etc. hard to read, here's a direct translation of the steps above into for comprehensions:

scala> for ((key, keyValues) <- l.groupBy{case (key, value) => key}) yield {
     |   val hasFail = keyValues.exists{case (key, value) => value == "FAIL"}
     |   (key, if (hasFail) "FAIL" else "PASS")                              
     | }
res0: scala.collection.immutable.Map[java.lang.String,java.lang.String] = Map((Agent 2,PASS), (Agent 1,FAIL), (Agent,FAIL))

You can call .toList at the end there if you really want a List.

Edit: slightly modified to use the exists idiom suggested by Daniel C. Sobral.

score 0 · Answer 8 · answered Dec 15 '10 at 17:27

Do you need to preserve the original order? If not, the shortest solution I know of (also quite straightforward) is:

{
  val fail = l.filter(_._2 == "FAIL").toMap        // Find all the fails
  l.filter(x => !fail.contains(x._1)) ::: fail.toList // All nonfails, plus the fails
}

but this won't remove extra passes. If you want that, then you need an extra map:

{
  val fail = l.filter(_._2 == "FAIL").toMap
  l.toMap.filter(x => !fail.contains(x._1)).toList ::: fail.toList
}

On the other hand, you might want to take the elements in the same order you originally found them. This is trickier because you need to keep track of when the first interesting item appeared:

{
  val fail = l.filter(_._2 == "FAIL").toMap
  val taken = new scala.collection.mutable.HashMap[String,String]
  val good = (List[Boolean]() /: l)((b,x) => {
    val okay = (!taken.contains(x._1) && (!fail.contains(x._1) || x._2=="FAIL"))
    if (okay) taken += x
    okay :: b
  }).reverse
  (l zip good).collect{ case (x,true) => x }
}

Scala - get unique values from List with a twist

8 Answers8