4

I want to extract a list of ID of a string pattern in the following: {(2),(4),(5),(100)}

Note: no leading or trailing spaces.

The List can have up to 1000 IDs.

I want to use rich string pattern matching to do this. But I tried for 20 minutes with frustration.

Could anyone help me to come up with the correct pattern? Much appreciated!

om-nom-nom
  • 62,329
  • 13
  • 183
  • 228
jerry
  • 355
  • 1
  • 6
  • 13
  • You want to extract a `List` of `String` like List("2","4","5","100")? – Brian Jan 04 '13 at 20:33
  • 3
    I take it http://www.scala-lang.org/api/current/index.html#scala.util.matching.Regex has been read? Try `findAllMatchIn` for something like `"""\(\d+\)"""` and the map Match (and String capures to) -> Int. –  Jan 04 '13 at 20:40
  • 3
    Also, be sure to show *what* has been tried and *how* it does not work as expected. –  Jan 04 '13 at 20:46

3 Answers3

4

Here's brute force string manipulation.

scala> "{(2),(4),(5),(100)}".replaceAll("\\(", "").replaceAll("\\)", "").replaceAll("\\{","").replaceAll("\\}","").split(",")

res0: Array[java.lang.String] = Array(2, 4, 5, 100)

Here's a regex as @pst noted in the comments. If you don't want the parentheses change the regular expression to """\d+""".r.

val num = """\(\d+\)""".r
"{(2),(4),(5),(100)}" findAllIn res0
res33: scala.util.matching.Regex.MatchIterator = non-empty iterator

scala> res33.toList
res34: List[String] = List((2), (4), (5), (100))
Brian
  • 20,195
  • 6
  • 34
  • 55
  • 1
    IMO, what he meant is to define regexp and then use regex unapply method to extract thoose tokens. E.g. `case regex(a,b,c) => ...`. Why don't you remove (, {, } and ) with `.replaceAll("\\{|\\}|\\(|\\)","")` and then split on commas? – om-nom-nom Jan 04 '13 at 20:56
  • Changed to remove then split. I like that approach better. Thanks. – Brian Jan 04 '13 at 21:05
2
"{(2),(4),(5),(100)}".split ("[^0-9]").filter(_.length > 0).map (_.toInt) 

Split, where char is not part of a number, and only convert non-empty results.

Might be modified to include dots or minus signs.

user unknown
  • 35,537
  • 11
  • 75
  • 121
0

Use Extractor object:

object MyList {
  def apply(l: List[String]): String =
    if (l != Nil) "{(" + l.mkString("),(") + ")}"
    else "{}"
  def unapply(str: String): Some[List[String]] = 
    Some(
      if (str.indexOf("(") > 0) 
        str.substring(str.indexOf("(") + 1, str.lastIndexOf(")")) split 
          "\\p{Space}*\\)\\p{Space}*,\\p{Space}*\\(\\p{Space}*" toList
      else Nil
    )
}

// test
"{(1),(2)}" match { case MyList(l) => l }
// res23: List[String] = List(1, 2)
idonnie
  • 1,703
  • 12
  • 11