1

I have a using the pipe function in spark RDD like this:

val custPrdRep = custPrdGrp.
pipe("sed s/CompactBuffer//g").
pipe("sed 's/\\], \\[//g'")

everything works except for the last pipe function...

I dont' get any results back when doing

custPrdRep.collect

in shell, this does work though:

$ echo "], [a" | sed  's/\\], \\[//g'
a

If I try it this way,

pipe("sed 's/\], \[//g'")

I get this error:

scala> val custPrdRep = custPrdGrp.pipe("sed s/CompactBuffer//g").pipe("sed s/|,/|/g").pipe("sed 's/\], \[//g'")
<console>:1: error: invalid escape character
       val custPrdRep = custPrdGrp.pipe("sed s/CompactBuffer//g").pipe("sed s/|,/|/g").pipe("sed 's/\], \[//g'")
                                                                                                     ^
<console>:1: error: invalid escape character
       val custPrdRep = custPrdGrp.pipe("sed s/CompactBuffer//g").pipe("sed s/|,/|/g").pipe("sed 's/\], \[//g'")

am I escaping the right characters in the right way?

lightweight
  • 3,227
  • 14
  • 79
  • 142
  • Why do you need `pipe` in the first place? If I remember your previous question you can simply create an output string in Scala without even touching regular expressions. – zero323 Aug 18 '15 at 23:17
  • Something like `custPrdGrp.map{case (k, vals) => {val valsString = vals.mkString(", "); s"{$k:, {$valsString}}" }}` or whatever format you wanted. – zero323 Aug 18 '15 at 23:34
  • @zero323 that works great! Can you also use that to format each element in value pair...so if I had this `val custPrd = accts.map(a => (a(0), ((a(1)), (a(2), a(3), a(4), a(5), a(6), a(7), a(8)))))`, can a(2), a(3), etc... different? – lightweight Aug 19 '15 at 02:02
  • 1
    I am not sure if I understand desired output but I am pretty sure you can. If you provide an example input (something that can be simply copied and pasted) and expected output I'll be happy to help, but I think it deserves a separate question. – zero323 Aug 19 '15 at 10:34
  • @zero323...thanks, I asked a related question here http://stackoverflow.com/questions/32095742/format-a-k-v-w-pair-in-spark-rdd ...can you post your response as the answer as well.... – lightweight Aug 19 '15 at 12:38

1 Answers1

0

Try:

val custPrdRep = custPrdGrp.pipe("sed s/CompactBuffer//g").pipe("sed s/|,/|/g").pipe("sed 's/\\], \\[//g'")

(note the additional \ before [ and ])

Also, I don't know which shell you're using, but in bash:

echo "], [a" | sed  's/\\], \\[//g'

will not work.

echo "], [a" | sed  "s/\\], \\[//g"

will

Bacon
  • 1,814
  • 3
  • 21
  • 36
  • I tried doing `pipe("sed 's/\\], \\[//g'")` but still got nothing back....I think I'm already trying to do it that way though... – lightweight Aug 19 '15 at 12:39
  • `sed` part looks OK, and it works just fine with PySpark but there is still something wrong on Scala part. BTW You can use raw strings (raw"""foo""") to avoid escaping, – zero323 Aug 19 '15 at 13:28