I am trying to develop an R script that can extract specific lines of downloaded HTML files. Here is a file example:
<html>
<head>
<title>ARMS Email System</title>
<meta name="record_type" content="FEDERAL (NOTES MAIL)">
<meta name="creator" content="redacted">
<meta name="creation_date" content="2000-11-22">
<meta name="to" content="redacted">
<meta name="cc" content=" ">
<meta name="bcc" content=" ">
<meta name="subject" content=" fwd: re: fwd: Accomplishments section of Progress Report ">
</head>
<body>
[redacted]
</body>
</html>
Ideally I would like it to extract Record Type, Creator, Creation, Subject, To (which all seemed to have meta tags) How can I scrape the "creation_date" of each record type in the html file?
html <- read_html(x ="/Users/.../A1.html")`
text = html %>%
html_element('creation_date') %>%
html_text2()