1

I have another question about Spark and Scala. I want to use that technologie to get data and generate a xml. Therefore, I want to know if it is possible to create node ourself (not automatic creation) and what library can we use ? I search but I found nothing very interesting(Like I'm new in this technologie, I don't know many keywords). I want to know if there is in Spark something like this code (I write that in scala. It works in local but I can't use new File() in Spark).

val docBuilder: DocumentBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder()
  val document = docBuilder.newDocument()

ar root:Element = document.createElement("<name Balise>")
    attr = document.createAttribute("<attr1>")
    attr.setValue("<value attr1>")
    root.setAttributeNode(<attr>)
    attr = document.createAttribute("<attr2>")
    attr.setValue("<value attr2>")
    root.setAttributeNode(attr)
    document.appendChild(root)
    document.setXmlStandalone(true)

var transformerFactory:TransformerFactory = TransformerFactory.newInstance()
    var transformer:Transformer = transformerFactory.newTransformer()
    var domSource:DOMSource = new DOMSource(document)
    var streamResult:StreamResult = new StreamResult(new File(destination))
    transformer.transform(domSource,streamResult)

I want to know if it's possible to do that with spark.

Thanks for your answer and have a good day.

NNK
  • 1,044
  • 9
  • 24
THIBAULT Nicolas
  • 159
  • 3
  • 11

1 Answers1

1

Not exactly, but you can do something similar by using Spark XML API pr XStream API on Spark.

First try using Spark XML API which is most useful when reading and writing XML files using Spark. However, At the time of writing this, Spark XML has following limitations.

 1) Adding attribute to root element has not supported.
 2) Does not support following structure where you have header and footer elements. 

  <parent>
       <header></header>
       <dataset> 
          <data attr="1"> suports xml tags and data here</data>
          <data attr="2">value2</data>  
      </dataset>
      <footer></footer>
  </parent>  

If you have one root element and following data then Spark XML is go to api.

Alternatively, you can look at XStream API. Below are steps how to use it to create custom XML structures.

1) First, create a Scala class similar to the structure you wanted in XML.

case class XMLData(name:String, value:String, attr:String) 

2) Create an instance of this class

val data = XMLData("bookName","AnyValue", "AttributeValue")

3) Conver data object to XML using XStream API. If you already have data in a DataFrame, then do a map transformation to convert data to an XML string and store it back in DataFrame. if you do so, then you can skip step #4

val xstream = new XStream(new DomDriver)
val xmlString = xstream.toXML(data)

4) Now convert xmlString to DataFrame

val df = xmlString.toDF()

5) Finally, write to a file

df.write.text("file://filename")

Here isa full sample example with XStream API

import com.thoughtworks.xstream.XStream
import com.thoughtworks.xstream.io.xml.DomDriver
import org.apache.spark.sql.SparkSession

case class Animal(cri:String,taille:Int)

object SparkXMLUsingXStream{
  def main(args: Array[String]): Unit = {
    val spark = SparkSession.
    builder.master ("local[*]")
    .appName ("sparkbyexamples.com")
    .getOrCreate ()

    var animal:Animal = Animal("Rugissement",150)
    val xstream1 = new XStream(new DomDriver())
    xstream1.alias("testAni",classOf[Animal])
    xstream1.aliasField("cricri",classOf[Animal],"cri")
    val xmlString = Seq(xstream1.toXML(animal))

    import spark.implicits._
    val newDf = xmlString.toDF()
    newDf.show(false)
  }
}

Hope this helps !!

Thanks

NNK
  • 1,044
  • 9
  • 24
  • Hello Naveen, thanks for your answer. Do you think that with this technologie : http://x-stream.github.io/alias-tutorial.html , I can have a similar result ? – THIBAULT Nicolas Jan 17 '19 at 10:20
  • Yes, as long as your XML are following XML syntax and semantics. Remember Xstram API is java/scala api not created for Spark. However, we can use it in spark to convert object to xml string. – NNK Jan 17 '19 at 15:55
  • Hello, for me the ligne : val df = xmlString.toDF() – THIBAULT Nicolas Jan 18 '19 at 11:05
  • Excuse me I push Enter before I finish my comment. – THIBAULT Nicolas Jan 18 '19 at 11:06
  • Again. How can we return to the line without post a comment ? the ligne `val df = xmlString.toDF()` didn't work. I have the following error : `:36: error: value toDF is not a member of String val df = xmlString.toDF()` . Do you know why ? (I can't print xmlString but can't convert). Thanks for your help. – THIBAULT Nicolas Jan 18 '19 at 11:09
  • Then with this library, some balises that I don't want are added to the String : ` Rugissement 150 <_-outer> <_-outer> <_-iw reference="../.."/> <_-outer> <_-iw reference="../.."/> ` I dont wan't the outer and reference. Only cricri and taille. (I continue to search on my side). – THIBAULT Nicolas Jan 18 '19 at 11:12
  • Please add your complete new program on this question or on another question, I will correct it and send across to you. I’ve done this in my project and it works well!!! – NNK Jan 18 '19 at 15:18
  • `class Animal(cCrie:String,cTaille:Int) { var cri:String = cCrie var taille:Int = cTaille } var animal:Animal = new Animal("Rugissement",150) val xstream1 = new XStream(new DomDriver()) xstream1.alias("testAni",classOf[Animal]) xstream1.aliasField("cricri",classOf[Animal],"cri") val xmlString = xstream1.toXML(animal) val newDf = xmlString.toDF()` – THIBAULT Nicolas Jan 18 '19 at 15:58
  • No I ask for stackOverflow :D. I still have the two problems that I say before. – THIBAULT Nicolas Jan 18 '19 at 16:23
  • On stackoverlow, indentation works only on questions and answers but not on comments. I've updated my answer with more info and example – NNK Jan 18 '19 at 16:39
  • Were you able to progress with the solution? – NNK Jan 20 '19 at 22:49