0

As the title states, I am trying to get the text from a docx file I store locally in my src folder. Currently I am using Docxtemplater to get text in the following way:

import Docxtemplater from "docxtemplater";
import PizZip from "pizzip";
import PizZipUtils from "pizzip/utils/index.js";
import docx from "../../assets/documents/T&C.docx";

function loadFile(url, callback) {
  PizZipUtils.getBinaryContent(url, callback);
}
...
  const [docText, setDocText] = useState("");

  loadFile(docx, function (error, content) {
    if (error) {
      throw error;
    }
    var zip = new PizZip(content);
    var doc = new Docxtemplater().loadZip(zip, {
      paragraphLoop: true,
      linebreaks: true,
    });
    setDocText(doc.getFullText());

My problem for this is that I am getting a chunk of text that is neither indented or styled (meaning it is missing new line and lists are also not in the correct format). Is there any other approach to get the text from a docx file by getting the style also (new lines lists and whatever it has there, excluding images or math formulas).

I am open to using other libraries if it makes my job easier.

nCis
  • 68
  • 10

0 Answers0