1

How can I get Text out of an URL from a website into a TextView or a String in Android/Java.

<html>
<body>
<div class="text" id="editorText" itemprop="text">I want to get this Text</div>
<body>
<html>

"I want to get this Text" into a String:

String TextFromWebsiteHere

EDIT:

Exception after attempting the answer on Jsoup below:

01-30 05:39:17.460: I/dalvikvm(1013): Could not find method org.jsoup.Jsoup.connect, referenced from method com.example.putstring.MainActivity.onClick2
01-30 05:39:17.460: W/dalvikvm(1013): VFY: unable to resolve static method 5293: Lorg/jsoup/Jsoup;.connect (Ljava/lang/String;)Lorg/jsoup/Connection;
01-30 05:39:17.460: D/dalvikvm(1013): VFY: replacing opcode 0x71 at 0x0002
01-30 05:39:18.450: D/dalvikvm(1013): GC_FOR_ALLOC freed 122K, 6% free 3260K/3452K, paused 33ms, total 36ms
01-30 05:39:18.800: D/gralloc_goldfish(1013): Emulator without GPU emulation detected.
01-30 05:39:35.150: D/AndroidRuntime(1013): Shutting down VM
01-30 05:39:35.150: W/dalvikvm(1013): threadid=1: thread exiting with uncaught exception (group=0xb1a97b90)
01-30 05:39:35.160: E/AndroidRuntime(1013): FATAL EXCEPTION: main
01-30 05:39:35.160: E/AndroidRuntime(1013): Process: com.example.putstring, PID: 1013
01-30 05:39:35.160: E/AndroidRuntime(1013): java.lang.NoClassDefFoundError: org.jsoup.Jsoup
01-30 05:39:35.160: E/AndroidRuntime(1013):     at com.example.putstring.MainActivity.onClick2(MainActivity.java:133)
01-30 05:39:35.160: E/AndroidRuntime(1013):     at com.example.putstring.MainActivity.access$8(MainActivity.java:131)
01-30 05:39:35.160: E/AndroidRuntime(1013):     at com.example.putstring.MainActivity$1.onClick(MainActivity.java:52)
01-30 05:39:35.160: E/AndroidRuntime(1013):     at android.view.View.performClick(View.java:4424)
01-30 05:39:35.160: E/AndroidRuntime(1013):     at android.view.View$PerformClick.run(View.java:18383)
01-30 05:39:35.160: E/AndroidRuntime(1013):     at android.os.Handler.handleCallback(Handler.java:733)
01-30 05:39:35.160: E/AndroidRuntime(1013):     at android.os.Handler.dispatchMessage(Handler.java:95)
01-30 05:39:35.160: E/AndroidRuntime(1013):     at android.os.Looper.loop(Looper.java:137)
01-30 05:39:35.160: E/AndroidRuntime(1013):     at android.app.ActivityThread.main(ActivityThread.java:4998)
01-30 05:39:35.160: E/AndroidRuntime(1013):     at java.lang.reflect.Method.invokeNative(Native Method)
01-30 05:39:35.160: E/AndroidRuntime(1013):     at java.lang.reflect.Method.invoke(Method.java:515)
01-30 05:39:35.160: E/AndroidRuntime(1013):     at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:777)
01-30 05:39:35.160: E/AndroidRuntime(1013):     at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:593)
01-30 05:39:35.160: E/AndroidRuntime(1013):     at dalvik.system.NativeStart.main(Native Method)
01-30 05:39:42.580: I/Process(1013): Sending signal. PID: 1013 SIG: 9
StoopidDonut
  • 8,547
  • 2
  • 33
  • 51
thankyou
  • 211
  • 1
  • 5
  • 13

3 Answers3

1

Use this:

String simpleString = Html.fromHtml("your_html_string").toString();

It will eradicate all the html stuff and will return the content in simple string.

Moreover, if you want to differentiate among different tags/classes and would need a specific text out of html then you may need to use some sophisticated solutions like JSOUP.

waqaslam
  • 67,549
  • 16
  • 165
  • 178
1

You can use jsoup library for your purpose; a simple example to read paragraphs from the rendered HTML on the website below:

 try {
            Document doc = Jsoup.connect("http://popofibo.com/pop/swaying-views-of-our-past/").get();
            Elements paragraphs = doc.select("p");
            for(Element p : paragraphs) {
              System.out.println(p.text());
            }
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } 

Output:

It is indeed difficult to argue over the mainstream ideas of evolution of human civilizations...

EDIT:

To demo with you HTML as a static content, you can easily use the id of your div tag:

public static void main(String... args) {
        Document doc = Jsoup
                .parse("<html><body><div class=\"text\" id=\"editorText\" itemprop=\"text\">I want to get this Text</div></body></html>");
        Elements divs = doc.select("div#editorText");
        for (Element d : divs) {
            System.out.println(d.text());
        }
    }

Output:

I want to get this Text

StoopidDonut
  • 8,547
  • 2
  • 33
  • 51
  • that sounds very great. But how can I add a jsoup library and how can I specify the DIV tag, so that only the text out of this tag comes to String? – thankyou Jan 30 '14 at 09:31
  • @PopFibo I tried your first example and I get no mistakes. But my Android App crashed.. Does that only work for Java? – thankyou Jan 30 '14 at 10:11
  • I tried your first example in Java and I get no mistakes. But Java program also crashes – thankyou Jan 30 '14 at 10:16
  • @thankyou Does second example work? I yes then probably look at your network settings – StoopidDonut Jan 30 '14 at 10:45
  • yes but i don't know what to do now.. it crashes.. can i send you the log ? When I click on a button your code runs. the app runs until i click the button. than the app crashes – thankyou Jan 30 '14 at 10:47
  • yes i added jsoup jar. I cleaned and relaunched it. but the app still crashes.. But it's ok Thank You – thankyou Jan 30 '14 at 12:04
1

What about this solution (without using an external library):

public static String getContentFromHtmlPage(String page) {
    StringBuilder sb = new StringBuilder();

    try {
        URLConnection connection = new URL(page).openConnection();
        BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
        String line;
        while ((line = in.readLine()) != null) {
            sb.append(line);
        }
        in.close();
    } catch (IOException e) {
        // handle exception
    }

    return Html.fromHtml(sb.toString()).toString();
}
bobbel
  • 3,327
  • 2
  • 26
  • 43