0

I'm using php-resque to parse and validate data in large files then import that data into a mysql database.

I already know that LOAD DATA INFILE can be used to read rows from a text file into a table, but doesn't perform any validation whatsoever.

My database structure:

ItemsFile Table:

id  filename  fileepath  valid_items invalid_items processed_items  processed

Item Table:

id  uid  item file_id created_at

My Resque Job Class looks like this:

php-resque forks a child process and instantiates ItemsFileProcessor class then

  1. setUp() gets called
  2. perform() gets called
/**
* Read and validate items form a file, and store them in a database.
*/

class ItemsFileProcessor {

    //ItemsFile Model instance
    private $items_file = null;

    //Item Model instance
    private $item = null;

    //retrieved from ItemsFile table.
    private $file = null;

    public function __construct() {
        $this->items_file = new ItemsFile();
        $this->item       = new Item();
    }

    public function setUp() {

        if (isset($this->args['file_id'])) {

            //get file from ItemsFile Table by id.
            $this->file = $this->items_file->getFile($this->args['file_id']);

            if (empty($this->file)) {

                //End job processing if file does not exist.
                exit(-1);

            }

        }
    }

    function perform() {

        //NodeJs, socket.io, redis, broadcasting system
        EventBroadcaster::broadcast('app-jobs-channel', 'file_processing_started');

        $processed_items = 0;
        $valid_items     = 0;
        $invalid_items   = 0;

        //item validation class instance
        $item_validator = new ItemValidator();

        try {

            $tmp_file = new SplFileObject($this->file->filepath);

            //Read items from file, and validate each item.
            while ($tmp_file->valid()) {
                $line = trim($tmp_file->fgets());
                if ($line !== '') {
                    if ($item_validator->isValid($line, new ItemValidationRule())) {

                        //store item in Item table.
                        $this->item->create([
                                'uid'     => 'foo',
                                'item'    => $line,
                                'file_id' => $this->file->id,
                            ]);

                        $valid_items++;

                    } else {

                        $invalid_items++;

                    }

                    $processed_items++;

                }
            }

            //update ItemsFile Table record
            $this->items_file->update(
                $this->file->id,
                [
                    'processed_items'  => $processed_items,
                    'valid_items'      => $valid_items,
                    'invalid_items'    => $invalid_items,
                    'processed'        => 'Processed',
                ]
            );

            EventBroadcaster::broadcast('app-jobs-channel', 'file_processing_completed');

        } catch (LogicException $exception) {

            //broadcast failure.
            EventBroadcaster::broadcast('app-jobs-channel', 'file_processing_failed');
            Logger::getInstance()->log('ProcessContactFile Exception: '.$exception->getMessage(), Logger::LOGTYPE_ERROR);

            exit(-1);

        }
    }

}

My Problems:

  • Processing a file takes too long
  • Mysql has to process all the insert request one by one. LOAD DATA INFILE is much faster.

My Question:

Is there a way to optimize this or maybe introduce LOAD DATA INFILE somehow.

xanadev
  • 751
  • 9
  • 26

1 Answers1

1

You can have many problems of performance while managing files with PHP. Then, I suggest you to do it with SHELL to parse the file and return one string (that represents your general request with all your inserts). From now you just have to execute this request.

Can help if isn't clear.

  • And if you still want to do it in PHP, you can try to build your request first. After this you execute it. – Mamadou SECK May 07 '18 at 12:34
  • i need to validate each line of the file with some complex validation rules which i already implemented in php in my **ItemValidator** module, so i can't just port that into a SHELL script. PS: i need an actual implementation if it's possible. – xanadev May 07 '18 at 12:42