I've got a database full of content that has (I think) been cut and paste from Word into TinyMCE. I now need to use PHPWord (latest version) to turn these records back into Word documents.
The data is full of html hex character codes like “ and – which I need to turn back into dashes, bullets and quotes. The code works perfectly well served to a browser as UTF-8 but nothing I've tried is working to get it turned into a Word doc.
No manipulation gives me a file I can't open.
This gives me â[square]¢
$section = $this->phpWord->addSection();
$str = html_entity_decode($str);
HTMLParser::addHtml($section, $str, false);
While this gives me just a square for each one...
$section = $this->phpWord->addSection();
$str = html_entity_decode($str);
$str = mb_convert_encoding($str, "Windows-1252","UTF-8");
HTMLParser::addHtml($section, $str, false);
I can get a string replace to turn the dashes and quotes into simple characters, but the client really wants smart quotes and en-dashes.
It feels like it should be really easy to fix, but I've always had a bit of a mental block when it comes to encoding issues.
In case it's relevant, here's my download headers
header("Content-Disposition: attachment;filename=".$filename.".docx");
header("Cache-Control: max-age=0");
header('Content-Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document');
header('Content-Transfer-Encoding: binary');
header('Cache-Control: must-revalidate, post-check=0, pre-check=0');
header('Expires: 0');