1

I am trying to read word content line by line. But I am facing an issue. When trying to read paragraph. If paragraph content is multi line. I am getting single line internally. Can any one please help me on this.

enter image description here

Expected Output:

Line 1 - > TERM BHFKGBHFGFKJHGKJSHFKG ABC1 IOUTOYTIUYRUYTIREYTU B08

Line 2 - > NBHFBHDFGJDSBHKHDGFJGJGDJK 3993 JBHKJSFGSDKFJDGFJKDSBF3993

Line 3 - > JHBJKFHKJGDGFSFGB08 HGHGGFGFDGJFFFDSGFABC1 JJBVHGHDFTERM

Line 4 - > TERMBHFKGBHFGFKJHGKJSHFKG ABC1IOUTOYTIUYRUYTIREYTU B08NBHFBHDFGJDSBHKHDGFJGJGDJK

Line 5 - > 39931234567890987654321

Actual Output:

Single Line -> TERM BHFKGBHFGFKJHGKJSHFKG ABC1 IOUTOYTIUYRUYTIREYTU B08 NBHFBHDFGJDSBHKHDGFJGJGDJK 3993 JBHKJSFGSDKFJDGFJKDSBF3993 JHBJKFHKJGDGFSFGB08 HGHGGFGFDGJFFFDSGFABC1 JJBVHGHDFTERM TERMBHFKGBHFGFKJHGKJSHFKG ABC1IOUTOYTIUYRUYTIREYTU B08NBHFBHDFGJDSBHKHDGFJGJGDJK 39931234567890987654321

Below is my code sample: OpenXml:

using (WordprocessingDocument doc = WordprocessingDocument.Open(fs, false))
{
    var bodyText = doc.MainDocumentPart.Document.Body;
    if (bodyText.ChildElements.Count > 0)
    {
        foreach (var items in bodyText)
        {
           if (items is Paragraph)
           {
                var par = items.InnerText;
           }
        }
   }
}

Office.Interop

               object nullobj = System.Reflection.Missing.Value;
                Word.Application app = new Word.Application();
                Word.Document doc = app.Documents.Open(FilePath, ref nullobj, FileAccess.Read,
                                                        ref nullobj, ref nullobj, ref nullobj,
                                                        ref nullobj, ref nullobj, ref nullobj,
                                                        ref nullobj, ref nullobj, ref nullobj,
                                                        ref nullobj, ref nullobj, ref nullobj);

                    foreach (Word.Paragraph paragraph in doc.Paragraphs)
                    {
                        var line = paragraph.Range.Text;
                    }
Cindy Meister
  • 25,071
  • 21
  • 34
  • 43

2 Answers2

1

It is not possible to determine individual lines in the closed file. Lines are dynamically generated when a document is opened in Word and where a line "breaks" depends on many factors - it's not necessarily the same from system profile to system profile. So it's necessary to use the interop, not Open XML to pick up where lines break on the screen.

What's more, the Word object model does not provide "Line" objects for this very reason - there is no "line", only a visual representation of how the page will print, given the current printer driver and version of Windows.

The only part of the Word object model that recognizes "lines" is Selection, as this works solely with what's displayed on the screen.

The following code demonstrates how this can be done.

First, since Selection is being worked with and this is visible on-screen, ScreenUpdating is disabled in order to reduce screen flicker and speed up processing. (Note that working with selections is generally much slower than other object model processing.)

Using ComputeStatistics the number of lines in a paragraph is determined. An array (you can also use a list or anything else) to contain the lines is instantiated. The paragraph range is "collapsed" to its starting point and visually selected.

Now the lines in the paragraph are looped, based on the number of lines. The selection is extended (MoveEnd method) by one line (again, moving by lines is only available to a selection) and the selected text written to the array (or whatever).

Finally, screen updating is turned back on.

wdApp.ScreenUpdating = false;
foreach (Word.Paragraph para in doc.Paragraphs)
{
    Word.Range rng = para.Range;
    int lNumLines = rng.ComputeStatistics(Word.WdStatistic.wdStatisticLines);
    string[] aLines = new String[lNumLines];

    rng.Collapse(Word.WdCollapseDirection.wdCollapseStart);
    rng.Select();

    for (int i = 0; i < lNumLines; i++)
    {
        wdApp.Selection.MoveEnd(Unit: Word.WdUnits.wdLine, Count: 1);
        aLines[i] = wdApp.Selection.Text;
        wdApp.Selection.Collapse(Word.WdCollapseDirection.wdCollapseEnd);
    }
    for (int i = 0; i < aLines.Length; i++)
    {
        Debug.Print(aLines[i]);
    }
}
wdApp.ScreenUpdating = true;
Cindy Meister
  • 25,071
  • 21
  • 34
  • 43
  • «The only part of the Word object model that recognizes "lines" is Selection, as this works solely with what's displayed on the screen.» That simply is not true. As indicated in my own answer, Word's predefined '\Line' bookmark and the Rectangle.Lines property both recognize lines on screen. – macropod May 01 '20 at 22:50
  • @macropod Mmmm. Perhaps it were better re-phrased, but the bookmark definitely also relies on a `Selection` object. Not sure how you'd get the correct target rectangle in this instance. – Cindy Meister May 08 '20 at 15:20
  • Word's predefined '\Line' bookmark works with Range objects, not just Selections. For example: Sub Test(): Dim Rng As Range: Set Rng = ActiveDocument.GoTo(What:=wdGoToLine, Name:="4"): Set Rng = Rng.GoTo(What:=wdGoToBookmark, Name:="\line"): MsgBox Rng.Text: End Sub For VBA code using the Rectangle.Lines property, see my posts in: https://www.tek-tips.com/viewthread.cfm?qid=1794887 – macropod May 08 '20 at 22:20
0

In Word, a paragraph is a sinle line of text. Change the size of the print area (e.g. change the margins and/or page size) or the font/point size and the text reflows accordingly. Moerover, since Word uses the active printer driver to optimise the page layout, what exists on a given line in one computer may not exist on the same line on another computer.

Depending on your requirements, though, you could employ Word's predefined '\Line' bookmark to navigate between lines or the Rectangle.Lines property.

macropod
  • 12,757
  • 2
  • 9
  • 21