3

Look at the end of this post for an addition to that problem with textboxes!

With this method I want to open a document, replace some text and then leave it alone. It works, thats something to be proud of. :D

public static void replaceInOpenXMLDocument(string pfad, string zuErsetzen, string neuerString)
        {
            using (WordprocessingDocument doc = WordprocessingDocument.Open(pfad, true))
            {
                var res = from bm in doc.MainDocumentPart.Document.Body.Descendants()
                          where bm.InnerText != string.Empty && bm.InnerText.Contains(zuErsetzen) && bm.HasChildren == false
                          select bm;

                foreach (var item in res)
                {
                    item.InsertAfterSelf(new Text(item.InnerText.Replace(zuErsetzen, neuerString)));
                    item.Remove();
                }
                doc.Close();
            }
        }

But it only works on replacing without special characters. For example:

OS will be replaced with Windows over 9000

[OS] will be left as it is.

CASE 1:

In the document:

You use os for whatever purpose you've got.

replaceInOpenXMLDocument("C:\NSA\suspects.docx", "os", "Win 2000");

Will result in this:

You use Win 2000 for whatever purpose you've got.

CASE 2:

With special chars ...

You use [os] for whatever purpose you've got.

replaceInOpenXMLDocument("C:\NSA\suspects.docx", "[os]", "Win 2000");

... it just ignores me:

You use [os] for whatever purpose you've got.

I tried several special characters ()[]{} etc., but they're never replaced.

Is there something I forgot to do? Or is it simply not able to replace with special characters with this method? If so, I just need a simple workaround.

Is there anybody out to help with my desperation? :)

SOLUTION / ADDITION 1:

Thanks to Flowerking for that! This is the code I'm using right now:

public static void replaceInOpenXMLDocument(string pfad, string zuErsetzen, string neuerString)
        {
            using (WordprocessingDocument doc = WordprocessingDocument.Open(pfad, true))
            {
                SimplifyMarkupSettings settings = new SimplifyMarkupSettings
                {
                    NormalizeXml = true, // Merges Run's in a paragraph with similar formatting

                };
                MarkupSimplifier.SimplifyMarkup(doc, settings);

                //zuErsetzen = new XElement("Name", zuErsetzen).Value;
                var res = from bm in doc.MainDocumentPart.Document.Body.Descendants()
                          where bm.InnerText != string.Empty && bm.InnerText.Contains(zuErsetzen) && bm.HasChildren == false
                          select bm;
                // bm.InnerText.Contains(zuErsetzen)

                foreach (var item in res)
                {
                    item.InsertAfterSelf(new Text(item.InnerText.Replace(zuErsetzen, neuerString)));
                    item.Remove();
                }

                doc.Close();
            }
        }

(This code will work for normal documents with normal text in it!)

SOLUTION / ADDITION 2: If you want to replace text in textboxes, I had to do a little workaround. Textboxes are declared as pictures, so the code above won't touch it.

I found an additional class (link) that searches even through textboxes. The ZIP-download includes an exmaple program, easy to understand.

Community
  • 1
  • 1
Trollwut
  • 541
  • 1
  • 7
  • 23

1 Answers1

4

This is happening because the Open XML word usually creates when a text contains special characters might look like :

  <w:r w:rsidRPr="00316587">
    <w:rPr>
      <w:rFonts w:ascii="Consolas" w:hAnsi="Consolas" w:eastAsia="Times New Roman" w:cs="Consolas" />
      <w:color w:val="823125" />
      <w:sz w:val="20" />
      <w:szCs w:val="20" />
      <w:lang w:eastAsia="en-GB" />
    </w:rPr>
    <w:t>[</w:t>
  </w:r>
  <w:proofErr w:type="gramStart" />
  <w:r w:rsidRPr="00316587">
    <w:rPr>
      <w:rFonts w:ascii="Consolas" w:hAnsi="Consolas" w:eastAsia="Times New Roman" w:cs="Consolas" />
      <w:color w:val="823125" />
      <w:sz w:val="20" />
      <w:szCs w:val="20" />
      <w:lang w:eastAsia="en-GB" />
    </w:rPr>
    <w:t>text-to-replace</w:t>
  </w:r>
  <w:proofErr w:type="gramEnd" />
  <w:r w:rsidRPr="00316587">
    <w:rPr>
      <w:rFonts w:ascii="Consolas" w:hAnsi="Consolas" w:eastAsia="Times New Roman" w:cs="Consolas" />
      <w:color w:val="823125" />
      <w:sz w:val="20" />
      <w:szCs w:val="20" />
      <w:lang w:eastAsia="en-GB" />
    </w:rPr>
    <w:t>]</w:t>
  </w:r>
</w:p>

The above shows open xml created for text [text-to-replace]. (Please note this might not always be the case, may be depends on the client you are using).

By the looks of your code doc.MainDocumentPart.Document.Body.Descendants() you are taking all the OpenXmlPart type Descendants for the whole body of the document and trying to replace the text iterating over one-by-one which leaves the actual text to be in one part and the special characters in two sperate parts. Hence the code fails to acheieve the required.

There might be different ways to workaround this.

Solution:

A nice (my preferred) solution would be to normalize the xml using Markup Simplifier from OpenXml Powertools, which will normalize the open xml markup to concatenate the text in a paragraph to simplify working programatically.

Example code:

using (WordprocessingDocument doc =
            WordprocessingDocument.Open("Test.docx", true))
 {
      SimplifyMarkupSettings settings = new SimplifyMarkupSettings
      {
             NormalizeXml = true, // Merges Run's in a paragraph with similar formatting

       };
        MarkupSimplifier.SimplifyMarkup(doc, settings);
  }

Please Refer to my answer here for more info on using MarkupSimplifier

Hope this helps :)

Community
  • 1
  • 1
Flowerking
  • 2,551
  • 1
  • 20
  • 30
  • Ah yes, I can follow you. I followed the install instructions in the README-file, but I cant complete it. I got the missing `System.Management.Automation` working, but I'm now stuck on another error: `type or namespace "OutputTypeAttribute" cant be found` (translation from German). Do you have a solution for that? Additionally I'm now googling the shizzle out of that problem. :) – Trollwut Sep 03 '13 at 11:05
  • Is it possible that I've got (though new installation of PowerShell) an old version of that DLL? If so: How to update it? – Trollwut Sep 03 '13 at 11:07
  • I tried to copy this DLL with several commands in the PowerShell (e.g. `Copy ([PSObject].Assembly.Location) C:\ `), but I cannot solve my problem. People on the interwebs say, that they may get a version with about 3 MB more size, but mine's always ~2.6 MB. (Just writing this to inform you about my tries.) – Trollwut Sep 03 '13 at 11:27
  • Ok, I now extracted that DLL from another PC and it seems to fit. Now I implemented some .cs-files into my project and used your code. It compiles without errors. My problem now is, that `[os]` isn't replaced, again. Seems liek the MarkupSimplifier is working, but not the way we expected it. Did I miss something? – Trollwut Sep 05 '13 at 10:47
  • Is it possible to post the new markup in your question? Also, It is easy to dedug, have a breakpoint on `foreach (var item in res)` and check if you can see the `innertext` for the `item` contains `[os]`. – Flowerking Sep 05 '13 at 13:42
  • I'm currently not at my workstation, I will post the new code (which doesnt differ so much from the old one) at time. Mh... i set a breakpoint, but how to check `innertext`? – Trollwut Sep 09 '13 at 11:53
  • Sorry for my absence, I catched a nice cold the last days. I've made Addtions 2+3 in the starting post. The code seems to work as it should. In Addition 3 is my routine to replace. After checking `innertext` in the foreach-loop I recognized, that only `[tag]` runs through. I can't get why it seems to ignore the other commands? – Trollwut Sep 12 '13 at 12:12
  • Checked it again: The function is executed every time, as expected. But it seems that `var res = from ...` only selects `[tag]`. How does it come that it can't find the other placeholders? – Trollwut Sep 12 '13 at 12:31
  • Aaaand checked again: I renamed that `.docx` to a ZIP-file. The `document.docx` shows that nothing has been normalized. Did I miss something? – Trollwut Sep 12 '13 at 12:36
  • Another addition: I placed the placeholders into textboxes. May that be may problem? Textboxes are saved like pictures. – Trollwut Sep 12 '13 at 13:20
  • I *loving* did it!! Your code never was a problem, it's been always correct. As I saw that textboxes are my problem, I did a little research and found a additional class ("SearchAndReplace") which I included and used. Now it's working perfectly. :) I really thank you for your solution and I will mark it as that. I will write an addition to the initial post to point to that problem with textboxes. Thanks! – Trollwut Sep 12 '13 at 14:20