2

I've got a database full of content that has (I think) been cut and paste from Word into TinyMCE. I now need to use PHPWord (latest version) to turn these records back into Word documents.

The data is full of html hex character codes like “ and – which I need to turn back into dashes, bullets and quotes. The code works perfectly well served to a browser as UTF-8 but nothing I've tried is working to get it turned into a Word doc.

No manipulation gives me a file I can't open.

This gives me â[square]¢

$section = $this->phpWord->addSection();
$str = html_entity_decode($str);
HTMLParser::addHtml($section, $str, false);

While this gives me just a square for each one...

$section = $this->phpWord->addSection();
$str = html_entity_decode($str);
$str = mb_convert_encoding($str, "Windows-1252","UTF-8");
HTMLParser::addHtml($section, $str, false);

I can get a string replace to turn the dashes and quotes into simple characters, but the client really wants smart quotes and en-dashes.

It feels like it should be really easy to fix, but I've always had a bit of a mental block when it comes to encoding issues.

In case it's relevant, here's my download headers

header("Content-Disposition: attachment;filename=".$filename.".docx");
header("Cache-Control: max-age=0");
header('Content-Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document');
header('Content-Transfer-Encoding: binary');
header('Cache-Control: must-revalidate, post-check=0, pre-check=0');
header('Expires: 0');
Sodium
  • 31
  • 4

0 Answers0