xhtml->docx->xhtml retain div ids?

Question

I'm attempting to export some CKEditor created xhtml fields from a database, convert them to a docx, edit the document in Word, then convert it back to xhtml and import those fields back into the database. I'm currently using docx4j-XHTMLImport, but I'm open to suggestions.

The xhtml structure is like:

<html><body>
<div id="database-field-1" class="field-section">
    <label>database-field-1</label>
    <div class="field-content">xyz</div>
</div>
<div id="database-field-2" class="field-section">
    <label>database-field-2</label>
    <div class="field-content">xyz</div>
</div>

etc...

So when converting between formats I would like to retain the id's from the divs so when I import again, I can parse the xhtml and extract the fields per id to update the database.

Thanks!

score 1 · Accepted Answer · answered Sep 08 '14 at 02:26

1

You can convert the divs to content controls.

See XHTML-docx roundtrip: content tracking (just written).

answered Sep 08 '14 at 02:26

JasonPlutext

15,352
4
44
84

Thanks Jason! Was able to convert html->docx->html! – Morgan Dowell Sep 08 '14 at 20:55

xhtml->docx->xhtml retain div ids?

1 Answers1