Background
Currently have a console app that gets email from 0365 outlook account, I am using the outlook api 2.0
Problem
I am accessing the email's body using the api, however the body comes in as a html string. I am using my go to regex functionality which removes the html tags, however outlook adds a css class to to their Html which is basically making my regex expression obsolete.
Code
string body = "<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style>
<!--
@font-face
{font-family:"Cambria Math"}
@font-face
{font-family:Calibri}
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif}
a:link, span.MsoHyperlink
{color:#0563C1;
text-decoration:underline}
a:visited, span.MsoHyperlinkFollowed
{color:#954F72;
text-decoration:underline}
span.EmailStyle17
{font-family:"Calibri",sans-serif;
color:windowtext}
.MsoChpDefault
{font-family:"Calibri",sans-serif}
@page WordSection1
{margin:1.0in 1.0in 1.0in 1.0in}
div.WordSection1
{}
-->
</style>
</head>
<body lang="EN-US" link="#0563C1" vlink="#954F72">
<div class="WordSection1">
<p class="MsoNormal"> </p>
</div>
<hr>
<p><b>Confidentiality Notice:</b> This e-mail is intended only for the addressee named above. It contains information that is privileged, confidential or otherwise protected from use and disclosure. If you are not the intended recipient, you are hereby notified
that any review, disclosure, copying, or dissemination of this transmission, or taking of any action in reliance on its contents, or other use is strictly prohibited. If you have received this transmission in error, please reply to the sender listed above
immediately and permanently delete this message from your inbox. Thank you for your cooperation.</p>
</body>
</html>
";
string viewString1 = Regex.Replace(body, "<.*?>", string.Empty);
string viewString12 = viewString1.Replace(" ", string.Empty);
Results from my Regular expression
<!--
@font-face
{font-family:"Cambria Math"}
@font-face
{font-family:Calibri}
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif}
a:link, span.MsoHyperlink
{color:#0563C1;
text-decoration:underline}
a:visited, span.MsoHyperlinkFollowed
{color:#954F72;
text-decoration:underline}
span.EmailStyle17
{font-family:"Calibri",sans-serif;
color:windowtext}
.MsoChpDefault
{font-family:"Calibri",sans-serif}
@page WordSection1
{margin:1.0in 1.0in 1.0in 1.0in}
div.WordSection1
{}
-->
Confidentiality Notice: This e-mail is intended only for the addressee named above. It contains information that is privileged, confidential or otherwise protected from use and disclosure. If you are not the intended recipient, you are hereby notified
that any review, disclosure, copying, or dissemination of this transmission, or taking of any action in reliance on its contents, or other use is strictly prohibited. If you have received this transmission in error, please reply to the sender listed above
immediately and permanently delete this message from your inbox. Thank you for your cooperation.
Objective
I will need to able strip html tags from the string, and also remove the css classes which outlook places in to the body.