5

My intention is to edit HTML documents, including modifying existing elements, deleting elements and inserting new ones.

I've read HTMLEditorKit's and related classes' documentation, as well as the relevant topic in Sun's Java Trail, yet there is very little information about actual HTML document manipulation. Most of the discussion and examples deal with reading and parsing HTML, not really editing it. Some Googling still did not yield an adequate solution, and trying to tackle the task with some coding trial and error mostly resulted in exceptions.

I've gone over related questions and answers here in SO, but most answers suggested some alternative, while I'm looking for a solution in the JDK. Perhaps HTMLEditorKit is of little use to non-swing applications, and there is an alternative outside javax.swing?

Here are a few tasks I'd like to learn how to perform:

  • Replace text in certain text fields.
  • Basic editing (find/replace or regexes) of <script> elements.
  • Color the border of certain elements.
  • Remove certain tags entirely (for example flash elements).

Assuming that HTMLEditorKit is the best HTML editing component in the JDK, what tutorial or reference do you recommend?

Oren Shalev
  • 952
  • 2
  • 9
  • 19

3 Answers3

3

I don't know about you but I think if the html page you are trying to manipulate isn't very complicated then you can built it yourself like that:

HTMLDocument doc = new HTMLDocument();

HTMLEditorKit kit = new HTMLEditorKit();

jEditorPane.setDocument(doc);

jEditorPane.setEditorKit(kit);

kit.insertHTML(doc, doc.getLength(), "<label> This label will be inserted inside the body  directly </label>", 0, 0, null);
kit.insertHTML(doc, doc.getLength(), "<br/>", 0, 0, null);
kit.insertHTML(doc, doc.getLength(), putYourVariableHere, 0, 0, null);

That way you can have full control over the html and it is faster to load than loading it from a outer html.

RyanSF
  • 73
  • 1
  • 6
2

The HTMLEditorKit is not an HTML editor but an editor for document models which allows to convert these document models from and to HTML. The internal model of the editor kit is not "HTML" but is based on DefaultStyledDocument. What confuses you is that there is a HTMLDocument class. But that is just a thin wrapper for the DefaultStyledDocument so it can be created from HTML and saved as HTML.

What you need is an HTML parser. Try jTidy. It will read the HTML, build an internal model (keeping things like <script> which HTMLEditorKit will ignore). You can then use a DOM API to modify the model.

That said, for many use cases, it's enough to filter the HTML with regular expressions or simple string search&replace.

Aaron Digulla
  • 321,842
  • 108
  • 597
  • 820
  • Too bad, I hoped there was a solution in the JDK. I'm trying to avoid external tools, so I'll consider a find/replace solution first. Thanks! – Oren Shalev Sep 22 '09 at 04:44
  • `HTMLDocument` actually contains a tree similar to DOM. It is designed to preserve everything it does not understand but it would not remove ` – Alexey Ivanov Aug 14 '11 at 18:33
0

I don't if there exists any tutorial on using HTMLDocument and HTMLEditorKit for editing HTML documents in Java. The JDK implementation is somewhat limited, yet internally it creates a tree of elements similar to DOM. You can access the tree from HTMLDocument using getRootElements() method:

Element html = doc.getRootElements()[0];

Here doc is an instance of HTMLDocument. I don't think it is easy to edit HTML with HTMLDocument but it is possible, see the following methods:

All of these methods accept Element as a reference point where the editing takes place. You can walk the tree structure of elements using its methods, and I showed you how to get the reference to the root of the tree.

Using these methods you can write a visual HTML editor. Just to show your HTML model, call setEditable(false) method on JEditorPane object.

For a very simple example on how you can manipulate the contents of HTML loaded into JEditorPane with HTMLDocument model, see my sample application in the answer to another HTML-related question, in particular the code of propertyChange even handler.

Although to have more control on the HTML, I would recommend using a library which creates HTML DOM and allows to modify it.

Community
  • 1
  • 1
Alexey Ivanov
  • 11,541
  • 4
  • 39
  • 68