0

I am a beginner in bash and have been trying to solve one annoying problem - I cannot add text to a MS Word (.doc) file.

I managed to extract text but not insert one. I tried using sed command but it ruins the file - I suppose because it adds the text to the file's 'source code' (not sure if that is the correct term). I also tried adding text to an .html file (using the same commands) and it still doesn't work. It only works with a simple .txt file.

Commands I have tried:

$: sed -i 'a/existingTest/newText' MyFile.doc

OR

$: sed "/existingText/a newText" MyFile.doc
# I use "existingTest" to identify the location where I want to append my newText.

With both commands the text is added (whether it is applied on a .doc or .html file) inside the 'source code' which makes the file unreadable.

Does anyone knows a way to add text to a .doc or an .html file? Preferably a solution that a beginner will understand but I will welcome anything :D

Thanks!

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
DonK
  • 1
  • 1
    I don't think you can deal with MS Word (a bunch of xml zipped together) in this way – Bing Wang Jan 19 '21 at 21:33
  • html should work with sed, though I am not sure if it is your syntax issue. can you post an example of the HTML you are trying to change? – Bing Wang Jan 19 '21 at 21:34
  • Are you sure you are working with `*.doc` format which is outdated almost 20 years ago? If it is `*.docx`, once unzip the docx file, find the file which contains the target text, edit it by replacing the text, then zip the files again. – tshiono Jan 20 '21 at 00:09
  • I am using .doc file. Maybe I can start using .docx, it might make it easier. About the .html file, it is a normal html file. I have just inserted some text inside for testing and used the commands I mentioned above in my post. Then the new text gets inserted in the source code, instead of as a normal line in the file. – DonK Jan 21 '21 at 10:05
  • Thank you for the feedback. As for the .html file, can you provide an example you tested so I can reproduce the problem? It will be better to update your question with the text of .html file rather than writing it in the comment. BR. – tshiono Jan 21 '21 at 11:25

1 Answers1

0

If your MS Word file has *.docx extention, would you please try:

unzip MyFile.docx
sed -i 'a/existingTest/newText' word/document.xml
zip -f MyFile.docx

As Bing Wang comments, docx document is a zipped file. Then once unzip the file, edit, then update the zip file reflecting the edit.

tshiono
  • 21,248
  • 2
  • 14
  • 22
  • Thank you. I will convert my files to docx first (from doc) and then apply your method. – DonK Jan 21 '21 at 10:10
  • I've understood you are actually handling with doc files. It is a good idea to convert them to docx at first. While docx file is composed of `standard` xml files, doc file is Microsoft proprietary format and it is not easy to parse without Microsoft API. – tshiono Jan 21 '21 at 11:31