I have a html document that after being parsed contains only formatted text.I was wondering if it is possible to get its text like I would do if I was mouse-selecting it + copy + paste in new Text Document?
I know that this is possible in Microsoft.Office.Interop where I have .ActiveSelection property that selects the content of the open Word.
I need to find a way to load the html somehowe(maybe in a browser object) and then copy all of its content and assign it to a string.
var doc = new HtmlAgilityPack.HtmlDocument();
var documetText = File.ReadAllText(myhtmlfile.html, Encoding.GetEncoding(1251));
documetText = this.PerformSomeChangesOverDocument(documetText);
doc.LoadHtml(documetText);
var stringWriter = new StringWriter();
AgilityPackEntities.AgilityPack.ConvertTo(doc.DocumentNode, stringWriter);
stringWriter.Flush();
var titleNode = doc.DocumentNode.SelectNodes("//title");
if (titleNode != null)
{
var titleToBeRemoved = titleNode[0].InnerText;
document.DocumentContent = stringWriter.ToString().Replace(titleToBeRemoved, string.Empty);
}
else
{
document.DocumentContent = stringWriter.ToString();
}
and then I return the document object.The problem is that the string is not always formatted as I want it to be