1

i want to extract text from .DOCX file by page using Microsoft.Office.Interop.Word in C#.

currently i am getting all the text of file by the method as mentioned below but i want it page by page so how can i do this

public void ImportWordFile()
    {

        object path = @"C:\Users\Vipin\Desktop\test.docx";

        Microsoft.Office.Interop.Word.Application word = new Microsoft.Office.Interop.Word.Application();
        object miss = System.Reflection.Missing.Value;
        object readOnly = true;
        Microsoft.Office.Interop.Word.Document docs = word.Documents.Open(ref path, ref miss, ref readOnly, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss);
        string totaltext = "";
        for (int i = 0; i < docs.Paragraphs.Count; i++)
        {
            totaltext += " \r\n " + docs.Paragraphs[i + 1].Range.Text.ToString();
        }
        var numberOfPages = docs.ComputeStatistics(Word.WdStatistic.wdStatisticPages, false);
        Debug.Write("WordFileText = " + numberOfPages);
        docs.Close();
        word.Quit();



    }
Neeraj Mehta
  • 1,675
  • 2
  • 22
  • 45

1 Answers1

0

Trouble is, Word does not really work with pages, which is why number of pages needs to be "computed" in the first place (basically Word is asking the printer).

Still, it might work somewhere along the following little code:

for(int i = 1;i<= numberOfPages; i++)
        {
            var pageRange = docs.Range()
                .GoTo(Microsoft.Office.Interop.Word.WdGoToItem.wdGoToPage,
                    Microsoft.Office.Interop.Word.WdGoToDirection.wdGoToAbsolute, i);
            //do your magic
        }

Hope this helps.

LocEngineer
  • 2,847
  • 1
  • 16
  • 28
  • Same way as before. you can loop through the paragraphs of pageRange to get all the paragraphs / text / whatever contained on that very page. So instead of looping through docs.Paragraphs, you can loop through a) the pages (this for loop) b) pageRange.Paragraphs (new loop in "do your magic") – LocEngineer Mar 25 '15 at 13:29
  • This didn't work for me. When using pageRange.Paragraphs it would give me only first paragraph. When using pageRange.Words it would give me only first word. This question solved my problem: http://stackoverflow.com/questions/28987095/get-pages-of-word-document – zeta Nov 04 '16 at 18:32