0

I have noticed that for Android 4.4 handsets, saving a webview with:

webview.saveWebArchive(name);

and reading it after with WebArchiveReader WebArchiveReader (see code below) throws an Encoding Exception:

11-08 15:10:31.976: W/System.err(2240): org.xml.sax.SAXParseException: Unexpected end of document 11-08 15:10:31.976: W/System.err(2240): at org.apache.harmony.xml.parsers.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:125)

The method used to read the stored XML file worked perfectly fine until 4.3 and it is (NOTE: I tried to parse it in two different ways):

public boolean readWebArchive(InputStream is) {
    DocumentBuilderFactory builderFactory =
            DocumentBuilderFactory.newInstance();
    DocumentBuilder builder = null;
    myDoc = null;
    try {
        builder = builderFactory.newDocumentBuilder();
    } catch (ParserConfigurationException e) {
        e.printStackTrace();
    }
    try {                
        //New attempt
        InputSource input = new InputSource(is);     
        input.setEncoding("UTF-8");
        myDoc = builder.parse(input); 

        //This used to be the way it used to work for
        //Android 4.3 and below without trouble
        //myDoc = builder.parse(is);

        NodeList nl = myDoc.getElementsByTagName("url");
        for (int i = 0; i < nl.getLength(); i++) {
            Node nd = nl.item(i);
            if(nd instanceof Element) {
                Element el = (Element) nd;
                // siblings of el (url) are: mimeType, textEncoding, frameName, data
                NodeList nodes = el.getChildNodes();
                for (int j = 0; j < nodes.getLength(); j++) {
                    Node node = nodes.item(j);
                    if (node instanceof Text) {
                        String dt = ((Text)node).getData();
                        byte[] b = Base64.decode(dt, Base64.DEFAULT);
                        dt = new String(b);
                        urlList.add(dt);
                        urlNodes.add((Element) el.getParentNode());
                    }
                }
            }
        }
    } catch (SAXParseException se){
        //Some problems parsing the saved XML file
        se.printStackTrace();
        myDoc = null;
    } catch (Exception e) {
        e.printStackTrace();
        myDoc = null;
    } 
    return myDoc != null;
}

I've played a bit with the way the buider is invoked. Instead of giving it a FileInputStream, I first create an InputSource as you can see to force a given encoding. However, I had no success. By not including the InputSource, the exception was instead:

org.xml.SAXParseException: Unexpected token

I've read in previous posts that this may be an encoding issue (e.g. android-utf-8-file-parsing) but none of the proposed solutions worked for me.

Does anyone else have the same issue or does anyone know what has changed on Kit Kat, and if so, how could it be avoided?

Many thanks in advance

Community
  • 1
  • 1
Narseo
  • 214
  • 1
  • 4
  • 11

2 Answers2

2

My WebArchiveReader code is not needed under Android 4.4 KitKat and newer to read back a saved web archive. If you save your page with webview.saveWebArchive(name); method on KitKat, you get an MHTML formatted file, as "@Dragon warrior" indicates above. To read this file back into webview, just use:

webView.loadUrl("file:///my_folder/mySavedPage.mht");

Just make sure to give your file the .mht or .mhtml extension, so that WebView recognizes its contents. Otherwise it may just display the MHTML code in text format.

Greg

gregko
  • 5,642
  • 9
  • 49
  • 76
  • Hi, looks like it still does not work completely fine in my case. I guess the problem is therefore on the mhtml/html file which may be stored incorrectly. Thanks for the reply! – Narseo Mar 30 '14 at 03:35
  • What is the saved archive is a **.txt** file? – IgorGanapolsky Nov 02 '17 at 20:13
1

I have the exactly same problem as you do.

Apparently, Android 4.4 WebView saves web archives as MHTML. Therefore, you can't use WebArchiveReader anymore.

You might want to parse MHTML files with some other 3rd party lib. Good luck!

Community
  • 1
  • 1
Dragon warrior
  • 1,644
  • 2
  • 24
  • 37
  • That's annoying. Thanks a lot for the pointer. I'll try to make it work and let you know on my progress. – Narseo Dec 05 '13 at 00:02