0

I'm currently working to convert several different file formats (.csv, .xlsx, .docx, .one) to .pdf output using the Cloudmersive API (https://api.cloudmersive.com/docs/convert.asp). Their documentation does not detail the type of encoding from the API_response during the conversion. I've tried several different approaches to write the api_response (output: str (byte)). It appears to successfully write to a .pdf file, but when I go to open it, Adobe says that the file is corrupted.

I've tried detecting the type of encoding but chardet found no encoding.


    configuration = cloudmersive_convert_api_client.Configuration()
    configuration.api_key['Apikey'] = 'PUT YOUR KEY HERE' #individual user-id linked to the account
    
    # create an instance of the API class
    api_instance = cloudmersive_convert_api_client.ConvertDocumentApi(cloudmersive_convert_api_client.ApiClient(configuration))

    # Convert Document to PDF
    if (os.stat(input_file).st_size != 0): #api does not work on empty files
        
        try:
            api_response = api_instance.convert_document_ppt_to_pdf(input_file) #ONLY DIFFERENCE
            
            os.remove(input_file) 
            output_file=os.path.splitext(input_file)[0]+".pdf"
            
            with open(output_file, 'wb') as binary_file:
                binary_file.write(bytearray(str(api_response),encoding='utf-8'))
                
            print(input_file, 'was processed by ConvertDocumentPptToPdf.')
            
        except ApiException as e:
            print(input_file, 'was not processed.')

I've also tried but this does not work either:

            with open(output_file, 'wb') as binary_file:
                binary_file.write(bytearray(api_response))

Here is some sample output from the API response (api_response):

b'b\'%PDF-1.5\\n%\\xc3\\xa4\\xc3\\xbc\\xc3\\xb6\\xc3\\x9f\\n2 0 obj\\n<</Length 3 0 R/Filter/FlateDecode>>\\nstream\\nx\\x9c\\x85TM\\x8b\\xdc0\\x0c\\xbd\\xe7W\\xf8\\xbc\\x10\\xaf$

Also, when I've tried to detect the encoding, it says the following: detection = chardet.detect(test.encode()) print(detection)

{'encoding': None, 'confidence': 0.0, 'language': None}
agarduno
  • 1
  • 2
  • What do you get if you just do `binary_file.write(api_response)`? – snakecharmerb Nov 30 '22 at 20:01
  • @snakecharmerb This is the error message that I receive from binary_file.write(api_response): `TypeError: a bytes-like object is required, not 'str'` – agarduno Dec 01 '22 at 15:35
  • @snakecharmerb . Also, if I change the code to `with open(output_file, 'w') as binary_file: binary_file.write(api_response)` The file is still corrupted. – agarduno Dec 01 '22 at 15:39
  • What does printing `type(api_response)` and `repr(api_response[:10])` produce? – snakecharmerb Dec 01 '22 at 15:53
  • `Str` is the type, and the second output is '`"b\'%PDF-1.5"'` – agarduno Dec 01 '22 at 15:58
  • Try `import ast; data = ast.literal_eval(api_response)` then `binary_file.write(data)`. – snakecharmerb Dec 01 '22 at 16:08
  • That worked! I've literally been stuck for several days on this. Thanks so much. I will accept this as the solution. – agarduno Dec 01 '22 at 16:26
  • To be honest once we reduce the problem to its essence - "how can we convert a stringified bytes instance back to bytes" - this is a duplicate - but thanks for the offer :-) – snakecharmerb Dec 01 '22 at 17:16
  • Note that the client returning stringified bytes seems like a bug to me, might be worth raising an issue with the provider. – snakecharmerb Dec 01 '22 at 17:21

1 Answers1

0

The following code worked as suggested in the comments:

import ast

# create an instance of the API class
api_instance = cloudmersive_convert_api_client.ConvertDocumentApi(cloudmersive_convert_api_client.ApiClient(configuration))

# Convert Document to PDF
if (os.stat(input_file).st_size != 0): #api does not work on empty files
    
    try:
        api_response = api_instance.convert_document_ppt_to_pdf(input_file) #ONLY DIFFERENCE
        
        os.remove(input_file) 
        output_file=os.path.splitext(input_file)[0]+".pdf"
        
        data = ast.literal_eval(api_response)
        
        with open(output_file, 'wb') as binary_file:
            binary_file.write(data)
            
        print(input_file, 'was processed by ConvertDocumentPptToPdf.')
        
    except ApiException as e:
        print(input_file, 'was not processed.')
agarduno
  • 1
  • 2