I'm working on a PowerShell script to convert a docx to HTML, and also to change the encoding of the HTML, because by default it saves it as windows-1252.
I need this because later on I send this HTML saved as the body for an email also send by PowerShell. As I am Spanish I need accents and tildes to show up (those are appearing as ?
right now).
I tried the SaveAs
method with all the parameters, but I couldn't get it to work.
This is my script:
$MSWord = New-Object -ComObject Word.Application
$MSWord.Documents.Open(“C:\Users\USER\Videos\CAMBIO_TURNO.docx”)
$MSWord.Visible = $false
# Save HTML
$saveFormat = [Enum]::Parse([Microsoft.Office.Interop.Word.WdSaveFormat], “wdFormatHTML”);
$path = “C:\Users\USER\Videos\CAMBIO_TURNO.html”
$MSWord.ActiveDocument.SaveAs([ref]$path, [ref]$saveFormat)
# Close File
$MSWord.ActiveDocument.Close()
$MSWord.Quit()
Then, to send it to me, I use this other code on PowerShell:
$OutputEncoding = [System.Text.Encoding]::UTF8
$body = [IO.File]::ReadAllText(“C:\Users\USER\Videos\CAMBIO_TURNO.html”)
Send-MailMessage -To “EMAIL@EMAIL” -From “EMAIL@EMAIL” -Subject “CAMBIO” -Body $body -Encoding $OutputEncoding -BodyAsHtml -Attachments “C:\Users\USER\Videos\CAMBIO_TURNO.xlsx” -Dno onSuccess, onFailure -SmtpServer smtp.gmail.com -Credential EMAIL@EMAIL
SECOND UPDATE
(Although I went to the page that is marked as duplicate: Word Document.SaveAs ignores encoding, when calling through OLE, from Ruby or VBS it didn't solve my problem. that word configuration doesn't work)
Here is what I tried after saving my document with the web options as utf-8:
#DEFINE outputencoding FOR THE CONSOLE - IT SEEMS THAT IT DOESN'T WORK. I typed ñ and ó and they appear as ?? becasue it doesn't convert the hexadecimal values to the right charset
$OutputEncoding= New-Object -typename System.Text.ASCIIEncoding
# Open word to add input into the signature file
$MSWord = New-Object -ComObject word.application
$MSWord.Documents.Open('C:\Users\USER\Videos\CAMBIO_TURNO.docx')
# Save HTML
$saveFormat = [Enum]::Parse([Microsoft.Office.Interop.Word.WdSaveFormat], 'wdFormatFilteredHTML');
$path = 'C:\Users\USER\Videos\CAMBIO_TURNO.html'
$default = [Type]::Missing
$MSWord.ActiveDocument.SaveAs2([ref]$path, [ref]$saveFormat, [ref]$default, [ref]$default, [ref]$default, [ref]$default, [ref]$default, [ref]$default, [ref]$default, [ref]$default, [ref]$default, [ref]28591)
# Close File
$MSWord.ActiveDocument.Close()
$MSWord.Quit()
$HTMLw = Get-Content -Path 'C:\Users\USER\Videos\CAMBIO_TURNO.html' -Encoding ASCII -Force
$HTMLw -replace 'charset=windows-1252','charset=ISO-8859-1' | Set-Content -Path 'C:\Users\USER\Videos\CAMBIO_TURNO.html' -Encoding ASCII -Force