1

PROBLEM :

I need files on my server to be encrypted and it works perfectly fine for .txt, .doc, .xls, .ppt but not with .docx, .xlsx and .pptx.

The problem when I try to edit a docx (or xlsx, pptx) is that the file gets corrupted by the way I encrypt/decrypt since it's not a proper way to edit a docx. So when Microsoft Word tries to open it, it says it's corrupted and it opens it as 'Document1.docx' and not as'MyFileName.docx' and when saving I have to give the name again and with pptx I even have to give the path to the webdav folder the document is in.

QUESTION :

Is there any way to get it to save in the right place without having to type the path ?

CODE :

Here is the code I use to encrypt the files :

$ext = explode( '.', basename($path));
if (in_array("doc", $ext) || in_array("docx", $ext)) {
    $handle = fopen("$davPath/$path", "rb");
    $data_file = fread($handle, filesize("$davPath/$path"));
    fclose($handle);
} else {            
    $data_file = file_get_contents("$davPath/$path");
}

$encrypt_data_file = $encryption->encrypt($data_file);

if (file_put_contents("$davPath/encrypt_" . basename($path),$encrypt_data_file)) {
    unlink("$davPath/" . basename($path));
    rename("$davPath/encrypt_" . basename($path),"$davPath/" . basename($path));
    return true;
} else {
    return false;
}

And here is the code I use to decrypt them :

$ext = explode( '.', basename($uri));
if(is_file($davPath."/".$uri)) {
    if (in_array("doc", $ext) || in_array("docx", $ext)) {
        $handle = fopen("$davPath/$uri", "rb");
        $data_file = fread($handle, filesize("$davPath/$uri"));
        fclose($handle);
    } else {
        $data_file = file_get_contents("$davPath/$uri");
    }   
}
if ($data_file != false) {
    $decrypt_data_file = $encryption->decrypt($data_file);

    header('Content-Description: File Transfer');
    header('Content-Type: application/octet-stream');
    header('Content-Disposition: attachment; filename='.basename($uri));
    header('Content-Location: '.$_SERVER['SCRIPT_URI']);
    header('Expires: 0');
    header('Cache-Control: must-revalidate');
    header('Pragma: public');
    ob_clean();
    flush();
    echo $decrypt_data_file;
    return false;
}

PS : I did find a workaround which consists in having the file decrypted on the server during the modification but I would really like not to have to do that.

Deduplicator
  • 44,692
  • 7
  • 66
  • 118
cilmela
  • 47
  • 7
  • Is the file really corrupted (eg the content is not showing), or just the filepath when saving is not correct ? In the second case I think it should just be an option to tell word where to save the file when running up word (with a cli argument); – edi9999 Jul 04 '14 at 09:57
  • Just the filepath but I'm using ItHit Ajax Library and their method EditDocument so I don't know how I can add an option to tell word where to save – cilmela Jul 04 '14 at 10:03
  • I don't think it has anything to do with your PHP code. Can you please post more about yhe Ajax Library you're using ? – edi9999 Jul 04 '14 at 10:05
  • It's a library to edit documents on a webdav server, see their site for more info http://www.webdavsystem.com/ajax. – cilmela Jul 04 '14 at 10:14
  • I do think there is a problem with my PHP code because when I try to open the docx with WinRAR, it says that the archive is corrupted and it can't open it. – cilmela Jul 04 '14 at 10:15
  • Well than your docx is corrupted ... I don't understand. You've told that only the filepath was wrong. – edi9999 Jul 04 '14 at 10:18
  • Sorry I wasn't clear because I misunderstood. You talked about content not showing but the library actually manages to open it so the content is showing. But yes the file is corrupted. Sorry for the misunderstanding – cilmela Jul 04 '14 at 10:22
  • Last time I had such a kind of an issue, I used a hex editor to look where lies the difference. http://stackoverflow.com/questions/18243668/what-is-wrong-with-this-binary-file-transfer-corrupting-docx-files/18314922#18314922 . Can you please tell what's different ? – edi9999 Jul 04 '14 at 10:40
  • What do I need to compare ? The docx before any encryption/decryption and one after encryption/decryption ? – cilmela Jul 04 '14 at 12:08
  • Yes, exactly. Eg we will see if only some of the bytes differ, or those in the beginnning or those on the end – edi9999 Jul 04 '14 at 12:20

2 Answers2

1

Thanks to edi9999 suggestion, I used a hex editor to look differences between not encrypted/decrypted docx and encrypted/decrypted one.

The only difference is at the end of the first one (not corrupted) there are 3 times '00' that are not in the corrupted one.

The solution for not having a corrupted docx was to add 3 times "\0" to the end of my decrypted data. And now it works perfectly fine !

For docx and pptx it's 3 times "\0" and for xlsx it's 4 times.

cilmela
  • 47
  • 7
  • I would guess they is an issue with your encrypter decrypter library that is causing the issue (eg ignoring null values at the end probably). Are you using this : https://github.com/o/crypt-php ? – edi9999 Jul 07 '14 at 10:06
  • No I'm using my own encryption/decryption methods with mcrypt and base64_encode/decode. But I don't mind adding "\0" at the end of my data. Anyway thank you very much for your help ! – cilmela Jul 07 '14 at 12:27
0

Your issue has been solved, but I'd like to add an answer to it.

When you have a corrupted docx, here are some steps to find out what's wrong :

First, try to unzip the zip. If it does work, your problem is with the content of the docx. If the unzip doesn't work, your zip seems to be corrupted

Problems with the content of the docx

When you open the docx, word will probably tell you where the problem lies, if the zip is not corrupted.

It will tell you for example: Parse error on line 213 of document.xml

Here's the "normal" structure of a docx, after unzipped.

+--docProps
|  +  app.xml
|  \  core.xml
+  res.log
+--word //this folder contains most of the files that control the content of the document
|  +  document.xml //Is the actual content of the document
|  +  endnotes.xml
|  +  fontTable.xml
|  +  footer1.xml //Containst the elements in the footer of the document
|  +  footnotes.xml
|  +--media //This folder contains all images embedded in the word
|  |  \  image1.jpeg
|  +  settings.xml
|  +  styles.xml
|  +  stylesWithEffects.xml
|  +--theme
|  |  \  theme1.xml
|  +  webSettings.xml
|  \--_rels
|     \  document.xml.rels //this document tells word where the images are situated
+  [Content_Types].xml
\--_rels
   \  .rels

As shown in the docx tag wiki.

Corrupted zip

If the zip is corrupted, in most of the cases, they are some characters at the beginning or at the end of the file that shouldn't be there (or that should and are not).

The best is to have a valid docx of the same document, and use the hexadecimal representation of both the documents to see what's the difference.

I usually use the hexdiff tool for this (apt-get install hexdiff).

This will usually show you where the extra characters are situated.

Quite often, the problem is that you have the wrong headers.

Community
  • 1
  • 1
edi9999
  • 19,701
  • 13
  • 88
  • 127