3

I need some advice on what's the best way to parse the output given by pdftk dump_data_fields using PHP?

In addition, the properties that I need to extract are: FieldName, FieldNameAlt and optionally FieldMaxLength and FieldStateOptions.

FieldType: Text
FieldName: TestName1
FieldNameAlt: TestName1
FieldFlags: 29360128
FieldJustification: Left
FieldMaxLength: 5
---
FieldType: Button
FieldName: TestName3
FieldFlags: 0
FieldJustification: Left
FieldStateOption: Off
FieldStateOption: Yes
---
...
BartoszKP
  • 34,786
  • 15
  • 102
  • 130
aqua
  • 377
  • 4
  • 17

2 Answers2

5

Would something like this suffice?

$handle = fopen("/tmp/bla.txt", "r");
if ($handle) {
    $output = array();
    while (($line = fgets($handle)) !== false) {
        if (trim($line) === "---") {
            // Block completed; process it
            if (sizeof($output) > 0) {
                print_r($output);
            }
            $output = array();
            continue;
        }
        // Process contents of data block
        $parts = explode(":", $line);
        if (sizeof($parts) === 2) {
            $key = trim($parts[0]);
            $value = trim($parts[1]);
            if (isset($output[$key])) {
                $i = 1;
                while(isset($output[$key.$i])) $i++;
                $output[$key.$i] = $value;
            }
            else {
                $output[$key] = $value;
            }
        }
        else {
            // handle malformed input
        }
    }

    // process final block
    if (sizeof($output) > 0) {
        print_r($output);
    }
    fclose($handle);
}
else {
    // error while opening the file
}

This gives you the following output:

Array
(
    [FieldType] => Text
    [FieldName] => TestName1
    [FieldNameAlt] => TestName1
    [FieldFlags] => 29360128
    [FieldJustification] => Left
    [FieldMaxLength] => 5
)
Array
(
    [FieldType] => Button
    [FieldName] => TestName3
    [FieldFlags] => 0
    [FieldJustification] => Left
    [FieldStateOption] => Off
    [FieldStateOption1] => Yes
)

Fishing out those values is then as easy as:

echo $output["FieldName"];
morido
  • 1,027
  • 7
  • 24
0

I have applied some amendments into above code and fixed some issues like last element field is not coming into array. Now updated code is below for array.

        // Get form data fields 
        $fieldsDataStr = '';
        $fieldsDataStr = $pdf->getDataFields();

    /* explode by \n and convert string into array. */
    $lines = explode("\n", $fieldsDataStr);  
    /* added '---' into end of lines array beucase we need to get last field value also based on below logic. */
    array_push($lines, "---");

    $output = array();
    $pdfDataArray = array();
    $counterField = 0;
    foreach($lines as $line) {
    if (trim($line) === "---") {
        // Block completed; process it
        if (sizeof($output) > 0) { 
        $pdfDataArray[] = $output;
        $counterField = $counterField + 1; //fields counter
        }
        $output = array();
        continue;
    }
    // Process contents of data block
    $parts = array();           
    $parts = explode(":", $line, 2); //2 is return array max limit, it will return array with first occurence of colon          
    if (sizeof($parts) === 2) {
        $key = trim($parts[0]);
        $value = trim($parts[1]);
        $output[$key] = $value;
    }   
        }

    print_r($pdfDataArray);

It will return proper array

BartoszKP
  • 34,786
  • 15
  • 102
  • 130