1
translation = {'                                             Cloud AI ': 'ਕਲਾਊਡ AI',
 'Entity Extraction': 'ਇਕਾਈ ਐਕਸਟਰੈਕਸ਼ਨ'
 ' Architecture': 'ਆਰਕੀਟੈਕਚਰ',
 ' Conclusion': 'ਸਿੱਟਾ',
 ' Motivation / Entity Extraction': 'ਪ੍ਰੇਰਣਾ / ਹਸਤੀ ਕੱਢਣ',
 ' Recurrent Deep Neural Networks': 'ਆਵਰਤੀ ਡੂੰਘੇ ਨਿਊਰਲ ਨੈੱਟਵਰਕ',
 ' Results': 'ਨਤੀਜੇ',
 ' Word Embeddings': 'ਸ਼ਬਦ ਏਮਬੈਡਿੰਗਸ',
 'Agenda': 'ਏਜੰਡਾ',
 'Also known as Named-entity recognition (NER), entity chunking and entity identification': 'ਨਾਮ-ਹਸਤੀ ਮਾਨਤਾ (NER), ਇਕਾਈ ਚੰਕਿੰਗ ਅਤੇ ਇਕਾਈ ਪਛਾਣ ਵਜੋਂ ਵੀ ਜਾਣਿਆ ਜਾਂਦਾ ਹੈ',
 'Biomedical Entity Extraction': 'ਬਾਇਓਮੈਡੀਕਲ ਇਕਾਈ ਐਕਸਟਰੈਕਸ਼ਨ',
 'Biomedical named entity recognition': 'ਬਾਇਓਮੈਡੀਕਲ ਨਾਮੀ ਇਕਾਈ ਦੀ ਮਾਨਤਾ',
 'Critical step for complex biomedical NLP tasks:': 'ਗੁੰਝਲਦਾਰ ਬਾਇਓਮੈਡੀਕਲ NLP ਕਾਰਜਾਂ ਲਈ ਮਹੱਤਵਪੂਰਨ ਕਦਮ:',
 'Custom Entity Extraction': 'ਕਸਟਮ ਇਕਾਈ ਐਕਸਟਰੈਕਸ਼ਨ',
 'Custom models': 'ਕਸਟਮ ਮਾਡਲ'}

Slide of ppt. If you could see the first word "Custom" is not replaced even though it is present in the dictionary translation

I would like to know why does this happen for some words.

The code for replacing words

prs = Presentation('/content/drive/MyDrive/presentation2.pptx')


# To get shapes in your slides

slides = [slide for slide in prs.slides]
shapes = []
for slide in slides:
    for shape in slide.shapes:
        shapes.append(shape)


def replace_text(replacements: dict, shapes: List[str]):
    """Takes dict of {match: replacement, ... } and replaces all matches.
    Currently not implemented for charts or graphics.
    """
    for shape in shapes:
        for match, replacement in replacements.items():
            if shape.has_text_frame:
                if (shape.text.find(match)) != -1:
                    text_frame = shape.text_frame
                    for paragraph in text_frame.paragraphs:
                        for run in paragraph.runs:
                            cur_text = run.text
                            new_text = cur_text.replace(str(match), str(replacement))
                            run.text = new_text

            if shape.has_table:
                for row in shape.table.rows:
                    for cell in row.cells:
                        if match in cell.text:
                            new_text = cell.text.replace(match, replacement)
                            cell.text = new_text

replace_text(translation, shapes) 

prs.save('output5.pptx')

output from the function

Custom Entity Extraction - Custom ਇਕਾਈ ਐਕਸਟਰੈਕਸ਼ਨ

expected output

Custom Entity Extraction - ਕਸਟਮ ਇਕਾਈ ਐਕਸਟਰੈਕਸ਼ਨ

I think I have found the reason of this happening. In the dictionary there is "Entity Extraction" so I think is that whereever it finds this word it replaces it and once it is replaced word "Custom" doesnot have any translation therefore it is left as it is. Now I am not sure how to make the function avoid doing that.

sha25
  • 23
  • 6
  • Print each `run.text` first and see what you get – Peter Wood Jun 13 '22 at 13:48
  • 1
    It's possible the problem isn't with the powerpoint part of the code. See how to create a [mcve] and edit the question. – Peter Wood Jun 13 '22 at 13:48
  • @PeterWood i have posted the result of run.text – sha25 Jun 13 '22 at 13:56
  • 1
    @PeterWood incorrect. `replace` replaces all by default. There is an optional argument for just the first `n` matches. – Steinn Hauser Magnússon Jun 13 '22 at 13:58
  • @SteinnHauserMagnusson where am I wrong exactly – sha25 Jun 13 '22 at 14:01
  • @sha25 I'm not sure. It seems to match the pattern but not replace each word. Since you're overwriting the entire string with replace, could it be possible to set `new_text = replacement` directly? – Steinn Hauser Magnússon Jun 13 '22 at 14:02
  • A quick text i performed: ` >>> s = {'Custom Entity Extraction': 'ਕਸਟਮ ਇਕਾਈ ਐਕਸਟਰੈਕਸ਼ਨ'} >>> st = "Custom Entity Extraction methods are great! " >>> st.replace('Custom Entity Extraction', s['Custom Entity Extraction']) 'ਕਸਟਮ ਇਕਾਈ ਐਕਸਟਰੈਕਸ਼ਨ methods are great! ' ` – Steinn Hauser Magnússon Jun 13 '22 at 14:09
  • @SteinnHauserMagnusson I edited the function as you said `new_text = cur_text.replace(match, replacements[match])` still the same output – sha25 Jun 13 '22 at 14:25
  • Why should it not leave the `Custom`? There's nothing saying it should be replaced. – Peter Wood Jun 13 '22 at 15:17
  • @PeterWood `Custom` is under one string so the whole string should be replaced – sha25 Jun 13 '22 at 15:32
  • @sha25 you're not making sense. Edit the question. You should be able to create a 4 line example showing the input, the replacement dictionary, the replacement call, the result and the expected output. Nothing else matters. – Peter Wood Jun 13 '22 at 16:08
  • @PeterWood I hope I have edited the question as you expected – sha25 Jun 14 '22 at 05:58
  • @sha25 so, the pdf is irrelevant to the problem you're facing. Edit the question to remove irrelevant details. – Peter Wood Jun 14 '22 at 12:36
  • @PeterWood - What if my slide contains the word that I am looking for (to replace) but `run.text` doesn't have it as one single word. Can share your views on this please? How do we avoid this? https://stackoverflow.com/questions/73219378/python-pptx-unexpected-split-of-a-line-into-keywords – The Great Aug 03 '22 at 09:56

1 Answers1

1

How about directly replacing the text when it finds a match? I.e.:

    for shape in shapes:
        for match, replacement in replacements.items():
            if shape.has_text_frame:
                if (shape.text.find(match)) != -1:
                    text_frame = shape.text_frame
                    for paragraph in text_frame.paragraphs:
                        for run in paragraph.runs:
                            run.text = replacement

            if shape.has_table:
                for row in shape.table.rows:
                    for cell in row.cells:
                        if match in cell.text:
                            cell.text = replacement

This is assuming you can replace entire texts, and not specific sections

  • Not a good idea. It greedily replaces the sentences. – sha25 Jun 13 '22 at 14:42
  • 1
    @Steinn Hauser Magnusson - Nice. I followed your solution but have a slightly different problem with run.text. Is this something that you would be able to help me with? https://stackoverflow.com/questions/73219378/python-pptx-unexpected-split-of-a-line-into-keywords – The Great Aug 03 '22 at 09:55
  • @TheGreat I'll have a look and see if I can contribute – Steinn Hauser Magnússon Aug 03 '22 at 10:36