How to write str (byte) from Cloudmersive API response to PDF file without file corruption

Question

I'm currently working to convert several different file formats (.csv, .xlsx, .docx, .one) to .pdf output using the Cloudmersive API (https://api.cloudmersive.com/docs/convert.asp). Their documentation does not detail the type of encoding from the API_response during the conversion. I've tried several different approaches to write the api_response (output: str (byte)). It appears to successfully write to a .pdf file, but when I go to open it, Adobe says that the file is corrupted.

I've tried detecting the type of encoding but chardet found no encoding.


    configuration = cloudmersive_convert_api_client.Configuration()
    configuration.api_key['Apikey'] = 'PUT YOUR KEY HERE' #individual user-id linked to the account
    
    # create an instance of the API class
    api_instance = cloudmersive_convert_api_client.ConvertDocumentApi(cloudmersive_convert_api_client.ApiClient(configuration))

    # Convert Document to PDF
    if (os.stat(input_file).st_size != 0): #api does not work on empty files
        
        try:
            api_response = api_instance.convert_document_ppt_to_pdf(input_file) #ONLY DIFFERENCE
            
            os.remove(input_file) 
            output_file=os.path.splitext(input_file)[0]+".pdf"
            
            with open(output_file, 'wb') as binary_file:
                binary_file.write(bytearray(str(api_response),encoding='utf-8'))
                
            print(input_file, 'was processed by ConvertDocumentPptToPdf.')
            
        except ApiException as e:
            print(input_file, 'was not processed.')

I've also tried but this does not work either:

            with open(output_file, 'wb') as binary_file:
                binary_file.write(bytearray(api_response))

Here is some sample output from the API response (api_response):

b'b\'%PDF-1.5\\n%\\xc3\\xa4\\xc3\\xbc\\xc3\\xb6\\xc3\\x9f\\n2 0 obj\\n<</Length 3 0 R/Filter/FlateDecode>>\\nstream\\nx\\x9c\\x85TM\\x8b\\xdc0\\x0c\\xbd\\xe7W\\xf8\\xbc\\x10\\xaf$

Also, when I've tried to detect the encoding, it says the following: detection = chardet.detect(test.encode()) print(detection)

{'encoding': None, 'confidence': 0.0, 'language': None}

What do you get if you just do `binary_file.write(api_response)`? — snakecharmerb, Nov 30 '22 at 20:01
@snakecharmerb This is the error message that I receive from binary_file.write(api_response): `TypeError: a bytes-like object is required, not 'str'` — agarduno, Dec 01 '22 at 15:35
@snakecharmerb . Also, if I change the code to `with open(output_file, 'w') as binary_file: binary_file.write(api_response)` The file is still corrupted. — agarduno, Dec 01 '22 at 15:39
What does printing `type(api_response)` and `repr(api_response[:10])` produce? — snakecharmerb, Dec 01 '22 at 15:53
`Str` is the type, and the second output is '`"b\'%PDF-1.5"'` — agarduno, Dec 01 '22 at 15:58
Try `import ast; data = ast.literal_eval(api_response)` then `binary_file.write(data)`. — snakecharmerb, Dec 01 '22 at 16:08
That worked! I've literally been stuck for several days on this. Thanks so much. I will accept this as the solution. — agarduno, Dec 01 '22 at 16:26
To be honest once we reduce the problem to its essence - "how can we convert a stringified bytes instance back to bytes" - this is a duplicate - but thanks for the offer :-) — snakecharmerb, Dec 01 '22 at 17:16
Note that the client returning stringified bytes seems like a bug to me, might be worth raising an issue with the provider. — snakecharmerb, Dec 01 '22 at 17:21

score 0 · Answer 1 · answered Dec 01 '22 at 16:51

The following code worked as suggested in the comments:

import ast

# create an instance of the API class
api_instance = cloudmersive_convert_api_client.ConvertDocumentApi(cloudmersive_convert_api_client.ApiClient(configuration))

# Convert Document to PDF
if (os.stat(input_file).st_size != 0): #api does not work on empty files
    
    try:
        api_response = api_instance.convert_document_ppt_to_pdf(input_file) #ONLY DIFFERENCE
        
        os.remove(input_file) 
        output_file=os.path.splitext(input_file)[0]+".pdf"
        
        data = ast.literal_eval(api_response)
        
        with open(output_file, 'wb') as binary_file:
            binary_file.write(data)
            
        print(input_file, 'was processed by ConvertDocumentPptToPdf.')
        
    except ApiException as e:
        print(input_file, 'was not processed.')

How to write str (byte) from Cloudmersive API response to PDF file without file corruption

1 Answers1