0

The below is the code used for reading data from a Document and porting into a Textfile,

But, Before writing in to a text file I want to remove or ignore special characters which are present in the document.Special characters means arrows, bullets, copy write symbols etc..,.When it comes to text file it shows some random characters.So, I want to remove or ignore those kind of characters or symbols before writing in to a text file.

object file;

file = filepathtb.Text;

object Target = Path.GetDirectoryName(System.Windows.Forms.Application.ExecutablePath) + "\\Temp_str.txt";
Microsoft.Office.Interop.Word.Application newApp = new Microsoft.Office.Interop.Word.Application();

object Unknown = Type.Missing;
newApp.Documents.Open(ref file, ref Unknown, ref Unknown, ref Unknown, ref Unknown, ref Unknown, ref Unknown, ref Unknown, ref Unknown, ref Unknown, ref Unknown, ref Unknown, ref Unknown, ref Unknown, ref Unknown, ref Unknown);
object format = Microsoft.Office.Interop.Word.WdSaveFormat.wdFormatText;

// if(newApp.ActiveDocument.Content.Characters = a

newApp.ActiveDocument.SaveAs(ref Target, ref format, ref Unknown, ref Unknown, ref Unknown,
    ref Unknown, ref Unknown, ref Unknown,
    ref Unknown, ref Unknown, ref Unknown,
    ref Unknown, ref Unknown, ref Unknown,
    ref Unknown, ref Unknown);
Charan Gourishetty
  • 109
  • 1
  • 2
  • 10
  • can you tell example of file content ? – Hamed Mar 29 '13 at 09:17
  • When working with Word Interop consider this: `Never use 2 dots with com objects`( http://stackoverflow.com/questions/158706/how-to-properly-clean-up-excel-interop-objects/4366693) I believe it applies to Word too. – jordanhill123 Mar 29 '13 at 09:19
  • @hamed A normal word document but which contains some special characters in between.. – Charan Gourishetty Mar 29 '13 at 09:21
  • @jordanhill123 I didn't get u cn you please elaborate, I want to ignore special characters in a word document. – Charan Gourishetty Mar 29 '13 at 09:23
  • 1
    My comment isn't specifically in response to your question.It's just a general heads up that when using Word Interop, the Word process can get stuck open in the background until your program closes if you don't properly handle the Word objects properly. This can be problematic if you are working with multiple Word documents and have a long running program. The link has more info on best practices and I've used it successfully with Excel Interop. – jordanhill123 Mar 29 '13 at 09:27

1 Answers1

1

Try something like this:

string myText = "sample text...";
string formattedText = String.Empty;

foreach(char c in myText)
{
    if(Char.IsLetterOrDigit(c) || Char.IsWhiteSpace(c) || Char.IsPunctuation(c))
        formattedText += c;
}
rhughes
  • 9,257
  • 11
  • 59
  • 87
  • I have a document which has nearly 380 pages. I can't read everything by characters, and where in your code it is removing or ignoring special characters ? – Charan Gourishetty Mar 29 '13 at 09:27
  • @CharanGourishetty The text that needs formatting is in `myText`. The `formattedText` variable will contain the text without the special characters. As regards the 380 pages, you could try splitting the string and parallelizing it. – rhughes Mar 29 '13 at 09:30
  • 1
    Er, `StringBuilder` anyone? – Matthew Watson Mar 29 '13 at 09:39
  • @rhughes When the text(containing special symbols) is converted into a string format, it turns into something like this ))) or #># So, I couldn't ignore these I tried with this earlier.Thanks for the reply – Charan Gourishetty Mar 29 '13 at 09:40
  • Anyways any approach you would consider will compare character by character under the hood – sll Mar 29 '13 at 09:47