-1

I created .doc file using html tags, which is having HTML form elements like Textbox, checkbox, radiobutton, dropdown and hidden fields. Those are showing properly when opening the document.

  1. I am able parse the .doc file when updated the .doc file with php code. And able to use the form fileds data when saving into Database.
  2. When using the 'Save As' option in the .doc file, the newly created doc file showing the html form elements properly. But unable to parse data from the 'Save As' file.

I want to parse the 'Save As' doc file also using php. Please help me how could i resolve this issue?

Here is my doc file parsing code:

function parseWord($userDoc) 
{
    $fileHandle = fopen($userDoc, "r");
    $line = @fread($fileHandle, filesize($userDoc));   
    $lines = explode(chr(0x0D),$line);
    $outtext = "";
    foreach($lines as $thisline)
      {
        $pos = strpos($thisline, chr(0x00));
        if (($pos !== FALSE)||(strlen($thisline)==0))
          {
          } else {
            $outtext .= $thisline." ";
          }
      }
      if(trim($outtext)==""){
         $outtext ="";
        //echo "<br> UTF ";
        $filename = $userDoc;
        if ( file_exists($filename) ) {
            $outtext ="";
          if ( ($fh = fopen($filename, 'r')) !== false ) {
            $headers = fread($fh, 0xA00);

            # 1 = (ord(n)*1) ; Document has from 0 to 255 characters
            $n1 = ( ord($headers[0x21C]) - 1 );

            # 1 = ((ord(n)-8)*256) ; Document has from 256 to 63743 characters
            $n2 = ( ( ord($headers[0x21D]) - 8 ) * 256 );

            # 1 = ((ord(n)*256)*256) ; Document has from 63744 to 16775423 characters
            $n3 = ( ( ord($headers[0x21E]) * 256 ) * 256 );

            # (((ord(n)*256)*256)*256) ; Document has from 16775424 to 4294965504 characters
            $n4 = ( ( ( ord($headers[0x21F]) * 256 ) * 256 ) * 256 );

            # Total length of text in the document
            $textLength = ($n1 + $n2 + $n3 + $n4);

            $extracted_plaintext = fread($fh, $textLength);

            # if you want the plain text with no formatting, do this
            //echo $extracted_plaintext;
            $outtext .= $extracted_plaintext;

            # if you want to see your paragraphs in a web page, do this
            //echo nl2br($extracted_plaintext);

          }
          fclose($fh);
        } 
     }
     $outtext = preg_replace("/[^a-zA-Z0-9\s\,\.\-\n\r\t@\/\_\(\)]/","",$outtext);

     return $outtext;
} 

$userDoc = "cv.doc";

$text = parseWord($userDoc);
echo $text;

Thanks in advance...

2 Answers2

0

I created .doc file using html tags

No, you created an HTML file and gave it a filename ending in .doc

When you save a file from MSWord it uses a proprietary format (actually multiple nested formats) which is not HTML. When you laod the file you originally create, MSWord is recognizing that its HTML and translating it on the fly. There are ways to address this, but you've still got a long journey to make before you are in a position to make best use of them.

Your best course of action now would be to consider the question of why you need to process a file in both MSWord and PHP and what other formats you might be using.

symcbean
  • 47,736
  • 6
  • 59
  • 94
  • Here my requirement is create offline form in .doc file. So that i need to save the html form elements data into database. That is the reason i created .doc file with html tags. – Venkata Kiran Jul 01 '14 at 10:06
  • Doesn't answer the question - why does it have to be in .doc format. – symcbean Jul 02 '14 at 09:04
0

As already stated you can't simply open Office files like you try to.

Here is a yet simple to use library, provided by Microsoft, which let's you do what you'd like todo:

http://phpword.codeplex.com/

Daniel W.
  • 31,164
  • 13
  • 93
  • 151