0

my target is to analyze an "html - String". In the end i'd like to extract the Textnodes and datanodes of a string and store them in different lists.

With my first appoach I tried to go through a "html - String" recursively.

import org.jsoup.Jsoup
import org.jsoup.nodes.Document
import java.util.Iterator
import org.jsoup.nodes
import org.jsoup.nodes.Node

object TextAnalyzer {

    def processNode(node: Node) {
        if (node.isInstanceOf[TextNode]) println(node.toString())
        node.childNodes() foreach processNode
    }

    def main(args: Array[String]) {
        val myHtml = "<html> <head> <title>Welcome</title>    </head>    <body>        <div>            <p>Foo</p>        </div>    </body></html>";

        val doc = Jsoup.parse(myHtml);
        processNode(doc);

    }
}

It ends with the following errow message:

scalac MyModule.scala

MyModule.scala:23: error: value childs is not a member of org.jsoup.nodes.Node node.childNodes() foreach processNode ^ one error found >

Can you get me startet in order to be able to get the data- and textnodes of a textstring ? ... recursively ?

Thanks in advance for help?

greets

Ansgar

Ansgar Helfrich
  • 41
  • 1
  • 2
  • 6

1 Answers1

0

I don't really understand your question - but the following compiles. Is it what you were aiming for?

import org.jsoup.Jsoup
import org.jsoup.nodes.Document
import java.util.Iterator
import org.jsoup.nodes._
import scala.collection.JavaConversions._

    object TextAnalyzer extends App {

        def processNode(node: Node) {
            if (node.isInstanceOf[TextNode]) println(node.toString())
            node.childNodes() foreach processNode
        }

        val myHtml = "<html> <head> <title>Welcome</title>    </head>    <body>        <div>            <p>Foo</p>        </div>    </body></html>";

        val doc = Jsoup.parse(myHtml);
        processNode(doc);

    }
selig
  • 4,834
  • 1
  • 20
  • 37
  • Hi selig! That's exactly what I was looking for. Thanks a lot! I think I should concentrate on some good how tos, including the right objects to inherit (recommendations?)... Cheers and have nice time! – Ansgar Helfrich Jun 09 '13 at 22:14