Reading contents from text/csv document with inconsistencies in data

Question

I am trying to import data from a source that is not a csv or txt but I am able to read it like a text / csv with my code.

The problem I am having is that some "data records" do not follow the same logic. I have approximately 70% of the document conforming, however, I think I may be missing something in the data that is throwing off the results.

I would appreciate it if you could please take a look at the code and the file and help me figure out why some of the data is not working like the rest of the document. I suspect it is because of odd number of characters (~ and/or >) in one of the fields or that the start/stop is slightly different for some of the records.

<?php
header("Content-Type:text/html");

$file = "data.txt";
if (($handle = fopen($file, "r")) !== FALSE) 
    {
        fgetcsv($handle, 1000, ">~Yn");
        $imports = array();

            while (($data = fgetcsv($handle, 1000, ">")) !== FALSE) 
            {
                if(strpos($data[4],'<') !== false)
                    {
                        echo "<br /><strong>Section:</strong> " . $data[5];
                        echo "<br /><strong>Row:</strong> " . $data[6];
                        echo "<br /><strong>Qty:</strong> " . $data[7];
                        echo "<br /><strong>Price:</strong> " . $data[8];
                        echo "<br /><strong>Notes:</strong> " . $data[10];
                    }
                else
                    {
                        echo "error: ";
                        print_r($data);
                    }
                echo "<br /><br /><br /><br />";
            }

            fclose($handle);
    }
?>

The sample data can be found here: Sample Data

Is it possible that I need a second delimiter when importing the file? If so, how can that be done? — NotJay, Jul 01 '14 at 20:30

score 0 · Accepted Answer · answered Jul 07 '14 at 13:10

I have found a solution that works better than the method I originally attempted. I first determined that loading it as a CSV was not giving me the best results. I then realized that there are common delimiters between each record that I was missing. That being said, I split the contents into lines and then split the lines into pieces using split(). I also ignored the first and last match because of data mismatches.

$file = "data.txt";
$content = file_get_contents($file);
$lines = split(">~", $content);
foreach($lines as $line)
    {
        $data = split(">", $line);

        if(strpos($data['5'],'.') !== false) //if the section is a price
            {
                //the first match is ignored
            }
        elseif(empty($data['7'])) //if Qty is empty
            {
                //the last match is ignored
            }
        else
            {
                echo "<br><br><br>";
                echo $data['5'] . " (Section) <br>";
                echo $data['6'] . " (Row) <br>";
                echo $data['7'] . " (Qty) <br>";
                echo $data['8'] . " (Price) <br>";
                //use the data
            }
    }

This resulted in a much more accurate and thorough data collection!

Reading contents from text/csv document with inconsistencies in data

1 Answers1