0

I need to extract the text with the bullet style from a word document in C#. I am using the aspose.words library but a solution with a different library is also welcome. I can already upload documents and extract the text with heading1 styling. but when I try the same with the bullet styling I get nothing.

I am using the code below to get the text with Heading1 styling and that works.

var heading1 = doc
    .GetChildNodes(NodeType.Paragraph, true)
    .Cast<Aspose.Words.Paragraph>()
    .ToArray()
    .Where(p => p.ParagraphFormat.StyleIdentifier == StyleIdentifier.Heading1);
    
foreach (var head1 in heading1)
{
    listBox11.Items.Add(head1.gettext()tostring());
}

I am trying to use the code below to get the text with bullet styling and this does NOT work.

var bullets = doc
    .GetChildNodes(NodeType.Paragraph, true)
    .Cast<Aspose.Words.Paragraph>()
    .ToArray()
    .Where(p => p.ParagraphFormat.StyleIdentifier == StyleIdentifier.ListBullet);
    
foreach (var bullet in bullets)
{
    listBox19.Items.Add(bullet.GetText().ToString());
}
    
listBox19.Items.Add(bullet1.GetText().ToString());

I also tried using the listbullet1,2,3,4 and 5 styleIdentifiers but that also does not fix the problem.

Richard
  • 7
  • 4
  • Based on a quick look at https://apireference.aspose.com/words/net/aspose.words/styleidentifier it seems there is more than one style for a bullet list. Try changing your second Where() clause to use StyleIdentifier.ListBullet2 etc. Perhaps the issue is the fact that the bullet styling used is not ListBullet. – Alex Dec 30 '20 at 14:39
  • Oh i'm sorry, I forgot to mention that i did try listbullet1,2,3,4 and 5 but that also does not fix the problem – Richard Dec 30 '20 at 15:11

2 Answers2

0

Most likely your code does not work because bullets are not applied via style. In MS Word document there are several levels where you can apply formatting: Document defaults, Theme, Style and direct formatting. In your case, I think, the best way is to use ListFormat.IsListItem property.

Alexey Noskov
  • 1,722
  • 1
  • 7
  • 13
0

I am now using this to succesfully extract the list items from a word file and put them into a listbox.

       string fileName = listBox1.Items.Cast<string>().FirstOrDefault();
                // Open the document.
                Document doc = new Document(fileName);

                doc.UpdateListLabels();

                NodeCollection paras = doc.GetChildNodes(NodeType.Paragraph, true);

                // Find if we have the paragraph list. In our document, our list uses plain Arabic numbers,
                // which start at three and ends at six.
                foreach (Aspose.Words.Paragraph paragraph in paras.OfType<Aspose.Words.Paragraph>().Where(p => p.ListFormat.IsListItem))
                {
                    //listBox19.Items.Add($"List item paragraph #{paras.IndexOf(paragraph)}");

                    // This is the text we get when getting when we output this node to text format.
                    // This text output will omit list labels. Trim any paragraph formatting characters. 
                    string paragraphText = paragraph.ToString(SaveFormat.Text).Trim();
                    //remove the dot in front of the bullet
                    string bullet = paragraphText.Remove(0, 2);

                    listBox19.Items.Add(bullet);

                    ListLabel label = paragraph.ListLabel;
                }
Richard
  • 7
  • 4