How to extract images from a document using pypandoc into a different folder in media repository of a project in Django?

Question

I am currently trying to extract the images from a document that the user is uploading into the media repository of my Django app. The code that currently works for me is:

 html = pypandoc.convert(
        tmp_loc,
        'html5',
        extra_args=['--extract-media=']
        )

This correctly extracts the images into the media directory as image01.jpg
In HTML the img src is:

<img src="/media/image01.jpg" />

Now the problem is that when the user uploads another docx which also has a image it replaces the previous image when it is uploaded as it is also saved by the name image01.jpg.

To solve this problem I thought we could just create a new folder in the media repository and name of the new folder would be the doc-name. So now the code looks like this:

html = pypandoc.convert(
        tmp_loc,
        'html5',
        extra_args=['--extract-media=/media/<some_doc_name>']
        )

But the moment I run this I get the following error:

Pandoc died with exitcode "1" during conversion: b'pandoc:     /media/docs: createDirectory: permission denied (Permission denied)\n'

Could someone guide me what is going wrong? How to fix this? Any alternative methods of solving this problem would also be appreciated!!

I am using the Pypandoc module in python.

what if the doc-name is also same? Instead, why don't you just rename the image file if it exists. — anonDuck, May 13 '16 at 22:21
The thing is pypandoc automatically generates the HTML and sets the img src to point towards image01.jpg. How do we modify the HTML generated by pandoc to point towards the renamed image? — Arunabh Ghosh, May 14 '16 at 01:36
You should read this https://github.com/bebraw/pypandoc/blob/master/README.md you can specify output filename — anonDuck, May 14 '16 at 02:18
I need to change the image name to something else, not the name of it's output file. The extract media option extracts the images from docx and saves them in media repository. How do I change the name of the images it extracts? — Arunabh Ghosh, May 14 '16 at 05:36

score 0 · Answer 1 · answered Jun 05 '16 at 10:19

error you are getting clearly says that you do not have permission to create directory under /media/docs

there maybe a multiple reasons why such a thing happens

you do not have permission to create subdirectories under "/media/docs" - just change the permissions
you have a permissions, but you are running your application under other user name that does not have permissions - create group and change permission for that group
you want to extract to "media" directory under your application, not the system root "/media" - your path is wrongly specified - should have "more" before e.g. "/home/user/program/media/docs" or "media/docs" (without leading "/")
you are trying to extract data to the non-existing subdirectory and your program can't handle such a situation, because can't create "parent" directories - so just make sure that directory is created

last thing - if you are uploading documents, do not assume that they do have unique names, use something unique (like primary key of the created record), or check uniqueness by validating that directory does not exist and if it is there, create new one with some additional number or random text at the end.

How to extract images from a document using pypandoc into a different folder in media repository of a project in Django?

1 Answers1