How to generate an xml or json from the text extracted from a slide?

Question

I'm using this code to extract the text from a slide show using pptx, how do I generate an xml or json file containing the text for each slide?

local_pptxFileList = ["/content/drive/MyDrive/Slides/Backlog Management.pptx"]

for i in local_pptxFileList:
            ppt = Presentation(i)
            for slide in ppt.slides:
                for shape in slide.shapes:
                    if shape.has_text_frame:
                      print(shape.text)

Please go through the [intro tour](https://stackoverflow.com/tour), the [help center](https://stackoverflow.com/help) and [how to ask a good question](https://stackoverflow.com/help/how-to-ask) to see how this site works and to help you improve your current and future questions, which can help you get better answers. "Show me how to solve this coding problem?" is off-topic for Stack Overflow. You have to make an honest attempt at the solution, and then ask a *specific* question about your implementation. Stack Overflow is not intended to replace existing tutorials and documentation. — Prune, Feb 16 '21 at 01:18
You must *specify* the input (not merely a file path that has no meaning outside of your machine), the output desired (not merely a format language), and include your coding attempt with a specific problem. — Prune, Feb 16 '21 at 01:19

score 0 · Answer 1 · answered Feb 16 '21 at 01:45

Store the extracted texts into a data structure such as a list (or list of lists, with one list for each presentation's texts).

Use json module to create a json from your data structure, and save to a file. I haven't dealt with encoding (e.g. as utf-8) to ensure that texts are correctly stored, but there's plenty of info about that you can find easily.

import json 

local_pptxFileList = ["/content/drive/MyDrive/Slides/Backlog Management.pptx"]

all_texts = [] 
for i in local_pptxFileList:
    ppt = Presentation(i)
    this_pres_texts = [] 
    for slide in ppt.slides:
        for shape in slide.shapes:
            if shape.has_text_frame:
                this_pres_texts.append(shape.text)
    all_texts.append(this_pres_texts)

with open('data.txt', 'w') as outfile:
    json.dump(all_texts, outfile)

How to generate an xml or json from the text extracted from a slide?

1 Answers1