export_media from Google Drive API don't write newlines

Question

When I try to export the text from a Google Docs using the export_media method, the result is a bloc of text (correct at least) but without newlines.

For exemple, if my file contains

Test

And another test

The display will looks like

Test And another test

Here is my code:

http = decorator.http()
service = discovery.build("drive", "v2", http=http)
docs = service.files().export_media(fileId=docs_key, mimeType="text/plain").execute()
docs = docs.decode('utf-8')

EDIT:

I also tried to export the file as html content instead of text. The problem is that I don't how to use this html as actual html instead of just a string. Here is my code :

http = decorator.http()
service = discovery.build("drive", "v2", http=http)
docs = service.files().export_media(fileId=docs_key, mimeType="text/html").execute()

docs = docs.decode('utf-8')
docs = docs.encode('ascii', 'xmlcharrefreplace')
h = HTMLParser()
docs = h.unescape(docs)

As you can see, docs contains now the google document in an html format. But if I try to display docs like this, my web page display this time the html code (still without newlines) :

<html><head><meta content="text/html; charset=UTF-8" http-equiv="content-type"><style type="text/css"> ul.lst-kix_yreqa03lukup-0{list-style-type:none}.lst-kix_yreqa03lukup-3 > li:before{content:"❏ "}.lst-kix_yreqa03lukup-4 > li:before{content:"❏ "}ul.lst-kix_yreqa03lukup-2{list-style-type:none}ul.lst-kix_yreqa03lukup-1{list-style-type:none}.lst-kix_yreqa03lukup-5 > li:before{content:"❏ "}.lst-kix_yreqa03lukup-1 > li:before{content:"❏ "}.lst-kix_yreqa03lukup-7 > li:before{content:"❏ "}.lst-kix_yreqa03lukup-0 > li:before{content:"✓ "}.lst-kix_yreqa03lukup-6 > li:before{content:"❏ "}.lst-kix_yreqa03lukup-8 > li:before{content:"❏ "}ul.lst-kix_yreqa03lukup-8{list-style-type:none}ul.lst-kix_yreqa03lukup-7{list-style-type:none}ul.lst-kix_yreqa03lukup-4{list-style-type:none}.lst-kix_yreqa03lukup-2 > li:before{content:"❏ "}ul.lst-kix_yreqa03lukup-3{list-style-type:none}ul.lst-kix_yreqa03lukup-6{list-style-type:none}ul.lst-kix_yreqa03lukup-5{list-style-type:none}</style></head><body style="background-color:#ffffff;padding:72pt 72pt 72pt 72pt;max-width:451.3pt"><p style="padding:0;margin:0;color:#000000;font-size:11pt;font-family:"Arial";orphans:2;widows:2"><span>First Online edit : 15h50</span></p></body></html>

Of course, I want to display the generate html as the webpage, and not the string it's actually showing.

There is nothing in the API that you can use to influence formatting, so you're kinda hosed. The only workaround I can think of would be to export as html and then parse the html into text yourself, or find a library that does it. — pinoyyid, May 09 '16 at 14:38
Well I already tried to do so but the html parsed was written as text on my website. I didn't find anything to make it read as actual html instead of a string. Any ideas about that ? — Kariamoss, May 09 '16 at 14:48
sorry I don't understand your comment. perhaps you could paste the html output into your question. — pinoyyid, May 09 '16 at 14:52
Ok, I edited my question, hope it will be more understandable — Kariamoss, May 09 '16 at 15:36
so I can see that you're exporting a valid HTML document. You should ask a new question on SO like "is there a python library to parse an HTML document" — pinoyyid, May 10 '16 at 04:35

export_media from Google Drive API don't write newlines

0 Answers0