Write List of Map data into csv

Question

val rdd = df.rdd.map(line => Row.fromSeq((
        scala.xml.XML.loadString("<?xml version='1.0' encoding='utf-8'?>" + line(1)).child
        .filter(elem =>
               elem.label == "name1" 
            || elem.label == "name2" 
            || elem.label == "name3"  
            || elem.label == "name4" 

        ).map(elem => (elem.label -> elem.text)).toList)
    )

I do rdd.take(10).foreach(println), My is RDD[Row] then produced the output something like this:

[(name1, value1), (name2, value2),(name3, value3)]
[(name1, value11), (name2, value22),(name3, value33)]
[(name1, value111), (name2, value222),(name4, value44)]

I want save this into csv with (name1..name4 are header of csv), anyone please help how can I implement this with apache spark 2.4.0

name1    | name2     | name3    | name4
value1   | value2    |value3    | null
value11  | value22   |value33   | null
value111 | value222  |null      | value444

pme · Accepted Answer · 2019-03-25T11:04:12.833

2

I adjusted your example and added some intermediate Values to help get each step:

  // define the labels you want:
  val labels = Seq("name1", "name2", "name3", "name4")
  val result: RDD[Row] = rdd.map { line =>
    // your raw data
    val tuples: immutable.Seq[(String, String)] = 
      scala.xml.XML.loadString("<?xml version='1.0' encoding='utf-8'?>" + line(1)).child
      .filter(elem => labels.contains(elem.label)) // you can use the label list to filter
      .map(elem => (elem.label -> elem.text)).toList // no change here
    val values: Seq[String] = 
    labels.map(l =>
      // take the values you have a label 
      tuples.find{case (k, v) => k == l}.map(_._2)
      // or just add an empty String
        .getOrElse(""))
    // create a Row
    Row.fromSeq(values)
  }

Now I am not sure - but in essence you have to insert the title Row as the first row:

[name1, name2, name3]

edited Mar 25 '19 at 11:04

answered Mar 24 '19 at 09:16

pme

14,156
3
52
95

that is not right, because when the name1 for example is missing from xml , how are we going to handle , how the header and data row going to consistent ? can you please help to provide the full of your solutions? – tree em Mar 25 '19 at 00:13
see my new answer - hope that gets you closer;) – pme Mar 25 '19 at 11:04
thank for your response. I am really appreciated . however I found a solution on transform this to json dataset then I write to csv, so the table header will work accordingly . – tree em Mar 26 '19 at 04:07

Write List of Map data into csv

1 Answers1