3

Hello community!

I got quite hard question (at least i think so), my client uses Microsoft Word documents (I omit the naming of those files, many of them have silly names e.g. "ść ..doc"), is it possible to open those documents under e.g. Eclipse env using Python 3.6 under Ubuntu?

for many years I used Windows 7 operating system, but i want some change, so i installed Ubuntu 16.04 LTS, I downloaded environment (Eclipse oxygen 4.7.0), pydev etc... But i forgot that my main document is saved as *.doc file.

Is any possible way to open those files? what do you propose? I was thinking about some king of "indirect" *.xml file, but what kind of lib should I use to open *.doc files under LibreOffice software? (I do not want use some "hack" to install Microsoft Word under Ubuntu), and what after taking data from file? what kind of lib use to save data to *.doc file under ubuntu? (Cause my client will opened it with Microsoft Office)

The schema is simple

  • Open *doc files with Python 3.6 under ubuntu,
  • manipulate those files,
  • save as *.doc files under ubuntu.

Maybe use some COM object to open files under different operating systems? could someone share whit some kind of "documentation" of COM object used in Python 3.6 under ubuntu? (sorry if I am wrong, I only heard that i can use COM object, I do not use it before)

Thanks for all replays, Greetings community! Eldiane

Eldiane
  • 43
  • 5

2 Answers2

0

use python docx and you can manipulate office documents without using com and it uses xml internally so its cross platform

for more information

click here

varnit
  • 1,859
  • 10
  • 21
  • Ye it seems to be fine, but i got *.doc files instead of *.docx files, and far as i know, *.docx files are zipped xml files, so it would be quite simple to open it. Is it posible to use for opening *.doc files python docx lib (or any different)? how would it open those *.doc files without Microsoft Word? – Eldiane Aug 22 '17 at 16:48
  • do you want just the text of the doc file or the doc files with full formatting – varnit Aug 22 '17 at 16:55
  • I think, unfortunately, I need all of content with all "white characters", cause user is a Noob (oh God forgive me for my exaltation), they do not use any kind of format for writing documents, there are many spaces, tabs, new lines, \r characters (btw it is a next question how \r chars are coded in *.doc files). But I used RegEx lib for extract data, i got it as a table as some python script, so it wont be a problem to do the same under ubuntu. – Eldiane Aug 22 '17 at 17:03
  • can you try using the following command soffice --convert-to txt filename.doc may be that works – varnit Aug 22 '17 at 17:08
  • yes i can do it under ubuntu, and is nice and pretty way to do what i want :) but in the end i wanted to write some kind of app that will run under windows (which has my client). But I am thinking right now that will be hard, I am guessing it will be easier to run windows, and write this script and use known method, am I right? – Eldiane Aug 22 '17 at 17:24
  • you can use python os module to run different codes on different operating system in windows you can use iron python bindings to easily manipulate doc files let me know if any other help needed – varnit Aug 22 '17 at 17:37
  • are you sure that .Net implementation will be usable on Ubuntu? I mean, can i install .Net framework (I think it will be possible in my way of thinking), but firstly helpfull will be some documentation for iron python if you could send some tips to install, use and standard lib will be great. I wanted to move this into chat but i need an some of reputation :P – Eldiane Aug 22 '17 at 17:48
0

I use subprocess to call LibreOffice, which then opens the file (.doc or .xlsx).

For example,

import subprocess
subprocess.call((libreoffice, complete_file_path))

Note the two pairs of parentheses required for subprocess.call

The file then opens in LibreOffice.

Panagiotis Simakis
  • 1,245
  • 1
  • 18
  • 45