2

I am trying to parse a FDF file using PHP, and regex. But I just cant get my head around regex. I am stuck parsing the file to generate a array.

%FDF-1.2
%âãÏÓ
1 0 obj 
<<
/FDF 
<<
/Fields [
<<
/V (email@email.com)
/T (field_email)
>> 
<<
/V (John)
/T (field_name)
>> 
<<
/V ()
/T (field_reference)
>>]
>>
>>
endobj 
trailer

<<
/Root 1 0 R
>>
%%EOF

Current function (source:http://php.net/manual/en/ref.fdf.php)

function parse2($file) {
 if (!preg_match_all("/<<\s*\/V([^>]*)>>/x", $file,$out,PREG_SET_ORDER))
         return;
 for ($i=0;$i<count($out);$i++) {
         $pattern = "<<.*/V\s*(.*)\s*/T\s*(.*)\s*>>";
         $thing = $out[$i][1];
         if (eregi($pattern,$out[$i][0],$regs)) {
                 $key = $regs[2];
                 $val = $regs[1];
                 $key = preg_replace("/^\s*\(/","",$key);
                 $key = preg_replace("/\)$/","",$key);
                 $key = preg_replace("/\\\/","",$key);
                 $val = preg_replace("/^\s*\(/","",$val);
                 $val = preg_replace("/\)$/","",$val);
                 $matches[$key] = $val;
         }
 }
 return $matches;
}

Result:

Array
(
    [field_email)
    ] => email@email.com)

    [field_name)
    ] => John)

    [field_reference)
    ] => )

)

Why does it conclude the ) and new line? I know this problem is trivial for someone that understands regex expressions. So help would be appreciated.

Ro Yo Mi
  • 14,790
  • 5
  • 35
  • 43

1 Answers1

2

Description

Your initial expression simply finds the entire block of text which represents each key and value set. Then in your clean up section, you're looking for a close paran which is followed immediately by a end of string \)$ but I'm sure there are additional characters between the close paran and the end of the string.

Instead I'd handle all this in one operation. This expression will:

  • find the field value
    • trim the surrounding parens off
    • and place into capture group 1
  • find the name of the value and place into capture group 2
    • trim the field_ substring off
    • trim the surrounding parens off
    • and place into capture group 2
  • requires the options: case insensitive, and multi-line

^\/V\s\(([^)]*)\)[\r\n]*^\/T\s\(field_([^)]*)\)

enter image description here

Example

Live Demo

Sample Text

%FDF-1.2
%âãÏÓ
1 0 obj 
<<
/FDF 
<<
/Fields [
<<
/V (email@email.com)
/T (field_email)
>> 
<<
/V (John)
/T (field_name)
>> 
<<
/V ()
/T (field_reference)
>>]
>>
>>
endobj 
trailer

<<
/Root 1 0 R
>>
%%EOF

Matches

[0][0] = /V (email@email.com)
/T (field_email)
[0][1] = email@email.com
[0][2] = email

[1][0] = /V (John)
/T (field_name)
[1][1] = John
[1][2] = name

[2][0] = /V ()
/T (field_reference)
[2][1] = 
[2][2] = reference



Or

If you wanted retain the field_ substring, then you can simply remove that from the expression like so:

^\/V\s\(([^)]*)\)[\r\n]*^\/T\s\(([^)]*)\)

enter image description here

animuson
  • 53,861
  • 28
  • 137
  • 147
Ro Yo Mi
  • 14,790
  • 5
  • 35
  • 43
  • adding the \ims flags, and the regex works perfectly in php (preg_match_all("/^\/V\s\(([^)]*)\)[\r\n]*^\/T\s\(field_([^)]*)\)/ims", $file,$out,PREG_SET_ORDER); Also in the mean time found http://www.debuggex.com/ great for debugging – user2413433 Aug 10 '13 at 15:05
  • Regexp are not great for parsing FDF, e.g. Chrome submits FDF `[<>]` – shuckc Nov 08 '20 at 21:34