2

I am trying to take advantage of CakePHP's saveMany feature (with associated data feature), however am creating duplicate records. I think it is because the find() query is not finding authors, as the transaction has not yet been committed to the database.

This means that if there are two authors with the same username, for example, in the spreadsheet, then CakePHP will not associate the second with the first, but rather create two. I have made up some code for this post:

/*
 * Foobar user (not in database) entered twice, whereas Existing user 
 * (in database) is associated
 */

$spreadsheet_rows = array(
    array(
      'title' => 'New post',
      'author_username' => 'foobar',
      'content' => 'New post'
    ),
    array(
      'title' => 'Another new post',
      'author_username' => 'foobar',
      'content' => 'Another new post'
    ),
    array(
      'title' => 'Third post',
      'author_username' => 'Existing user',
      'content' => 'Third post'
    ),
    array(
      'title' => 'Fourth post', // author_id in this case would be NULL
      'content' => 'Third post'
    ),

);


$posts = array();

foreach ($spreadsheet_rows as $row) {

    /*
     * This query doesn't pick up the authors
     * entered automatically (see comment 2.)
     * within the db transaction by CakePHP,
     * so creates duplicate author names
     */

    $author = $this->Author->find('first', array('conditions' => array('Author.username' => $row['author_username'])));

    $post = array(
        'title' => $row['title'],
        'content' => $row['content'],
    );

    /*
     * Associate post to existing author
     */

    if (!empty($author)) {
        $post['author_id'] = $author['Author']['id'];
    } else {

        /*
         * 2. CakePHP creates and automatically
         * associates new author record if author_username is not blank
         * (author_id is NULL in db if blank)
         */

        if (!empty($ow['author_username'])) {            
             $post['Author']['username'] = $row['author_username'];
        }
    }

    $posts[] = $post;
}


$this->Post->saveMany($posts, array('deep' => true));

Is there any way that this can be achieved, while also keeping transactions?

  • Could you show a sample of the generated data to save? Also why are you saving via the `Author` model when the main model data belongs to posts? ps. please always mention your exact CakePHP version and tag your question accordingly! – ndm Nov 10 '14 at 11:54
  • You are inserting new rows because you don't have a primaryKey for the main model to which you are saving. Data with an id is an edit, without is an insert. – David Yell Nov 10 '14 at 12:42
  • @ndm Saving via the Author model was a typo. I am saving new rows, but it should only happen once. After the (just inserted) user is inserted, the query should pick up that the user already exists in the database, and use the ID instead of duplicating the user. I believe I need nested transactions, but am not sure whether there is a better way. –  Nov 10 '14 at 12:58
  • So you have an `Author hasMany Posts` association? And what you actually want to do, is save many posts, where multiple posts can belong to a single (not necessarily yet existing) user? – ndm Nov 10 '14 at 13:28

3 Answers3

0

Update

You new requirement to save also posts that have no associated authors changes the situation a lot, as mentioned in the comments, CakePHPs model save methods are not ment to be able to save data from different models at once if it's not an association, if you need to do this in a transaction, then you'll need to handle this manually.

Save authors and their posts instead of posts and their authors

I would suggest that you save the data the other way around, that is save authors and their associated posts, that way you can easily take care of the duplicate users by simply grouping their data by using the username.

That way around CakePHP will create new authors only when neccessary, and add the appropriate foreign keys to the posts automatically.

The data should then be formatted like this:

Array
(
    [0] => Array
        (
            [username] => foobar
            [Post] => Array
                (
                    [0] => Array
                        (
                            [title] => New post
                        )
                    [1] => Array
                        (
                            [title] => Another new post
                        )
                )
        )
    [1] => Array
        (
            [id] => 1
            [Post] => Array
                (
                    [0] => Array
                        (
                            [title] => Third post
                        )
                )
        )
)

And you would save via the Author model:

$this->Author->saveMany($data, array('deep' => true));

Store non associated posts separately and make use of transactions manually

There is no way around this if you want to use the CakePHP ORM, just imagine what the raw SQL query would need to look like if it would need to handle all that logic.

So just split this into two saves, and use DboSource::begin()/commit()/rollback() manually to wrap it all up.

An example

Here's a simple example based on your data, updated for your new requirements:

$spreadsheet_rows = array(
    array(
      'title' => 'New post',
      'author_username' => 'foobar',
      'content' => 'New post'
    ),
    array(
      'title' => 'Another new post',
      'author_username' => 'foobar',
      'content' => 'Another new post'
    ),
    array(
      'title' => 'Third post',
      'author_username' => 'Existing user',
      'content' => 'Third post'
    ),
    array(
      'title' => 'Fourth post',
      'content' => 'Fourth post'
    ),
    array(
      'title' => 'Fifth post',
      'content' => 'Fifth post'
    ),
);

$authors = array();
$posts = array();
foreach ($spreadsheet_rows as $row) {
    // store non-author associated posts separately
    if (!isset($row['author_username'])) {
        $posts[] = $row;
    } else {
        $username = $row['author_username'];

        // prepare an author only once per username
        if (!isset($authors[$username])) {
            $author = $this->Author->find('first', array(
                'conditions' => array(
                    'Author.username' => $row['author_username']
                )
            ));

            // if the author already exists use its id, otherwise
            // use the username so that a new author is being created
            if (!empty($author)) {
                $authors[$username] = array(
                    'id' => $author['Author']['id']
                );
            } else {
                $authors[$username] = array(
                    'username' => $username
                );
            }
            $authors[$username]['Post'] = array();
        }

        // group posts under their respective authors
        $authors[$username]['Post'][] = array(
            'title' => $row['title'],
            'content' => $row['content'],
        );
    }
}

// convert the string (username) indices into numeric ones
$authors = Hash::extract($authors, '{s}');

// manually wrap both saves in a transaction.
//
// might require additional table locking as
// CakePHP issues SELECT queries in between.
//
// also this example requires both tables to use
// the default connection
$ds = ConnectionManager::getDataSource('default');
$ds->begin();

try {
    $result =
        $this->Author->saveMany($authors, array('deep' => true)) &&
        $this->Post->saveMany($posts);

    if ($result && $ds->commit() !== false) {
        // success, yay
    } else {
        // failure, buhu
        $ds->rollback();
    }
 } catch(Exception $e) {
    // failed hard, ouch
    $ds->rollback();
    throw $e;
}
ndm
  • 59,784
  • 9
  • 71
  • 110
  • My system needs author_id to also be NULL. Does this solution allow that? i.e. $authors[]['Post'] = array('title' => 'foo', 'content' => 'bar'); –  Nov 10 '14 at 14:50
  • @Hal9k I don't really get what you mean, when the foreign key is NULL, then there is no association, but you _want_ to save associations... what situation exactly are you referring to? – ndm Nov 10 '14 at 15:11
  • @Hal9k Are you saying that there might be rows with no `author_username` field, ie posts are not necessarily required to have an associated author? – ndm Nov 10 '14 at 15:23
  • Thanks for your comments. You are correct, there could be posts with no author_username. I have modified the code in my original post to reflect this more clearly. –  Nov 10 '14 at 15:30
  • @Hal9k Now that changes the situation a lot, CakePHPs save methods are not ment to be able to save different (non-associated) models at once. If you need this to be done in a transaction, then you'll have to take care of that manually. I'll update my answer shortly. – ndm Nov 10 '14 at 17:06
  • The models would be associated, it is just that author_id could be null. If CakePHP's saveMany is called with deep and the model association simply does not exist for that record, then NULL will be entered into the database. The crux of the problem is that the SELECT is not finding records that are waiting to be inserted as part of the overall transaction. This results in duplicates. The situation should rather be: "I entered this author already earlier in the saveMany array, let's associate him rather than adding him again". –  Nov 10 '14 at 17:18
  • @Hal9k I might have choosen the wrong wording, with "non-associated models" I wanted to refer to the actual data being saved, ie the models are associated, but you are not saving an association, thus this requires saving via different model objects. – ndm Nov 10 '14 at 17:30
  • @Hal9k I'll need some time till I can come back to this, my test setup is going totally crazy right now, not inserting anything anymore >( – ndm Nov 10 '14 at 18:01
  • Ok, I had a very hard time here, had to restore my DB from a backup, and then I stumbled over a duplicate HTTP request problem while testing the code for you, which was finally solved hours of trial and error and a final call for help on IRC... now let me update my answer. – ndm Nov 10 '14 at 23:09
  • I will put your example into a unit test and post back/accept your answer. –  Nov 12 '14 at 15:44
  • Thanks for the work you took on this. I have added the code that I went with, following your use manual transactions advice. –  Jan 05 '15 at 16:25
0

You need to use saveAll, which is a mix between saveMany and saveAssociated (you will need to do both of them here). Plus, you need to change the structure of each post.

Here is an example of the structures you will need to create inside the loop.

<?php
  $posts = array();

  //This is a post for a row with a new author
  $post = array (
    'Post' => array ('title' => 'My Title', 'content' => 'This is the content'),
    'Author' => array ('username' => 'new_author')
  );
  $posts[] = $post;

  //This is a post for a row with an existing author
  $post = array (
    'Post' => array ('title' => 'My Second Title', 'content' => 'This is another content'),
    'Author' => array ('id' => 1)
  );
  $posts[] = $post;

  //This is a post for a row with no author
  $post = array (
    'Post' => array ('title' => 'My Third Title', 'content' => 'This is one more content')
  );
  $posts[] = $post;


  $this->Post->saveAll($posts, array ('deep' => true));

?>
sepelin
  • 49
  • 6
0

Following the "use transactions manually" bit suggested by ndm, this piece of code (written in a unit test!) seemed to do the trick:

public function testAdd() {
    $this->generate('Articles', array());

    $this->controller->loadModel('Article');
    $this->controller->loadModel('Author');

    $csv_data = array(
        array(
            'Article' => array(
                'title' => 'title'
            )),
        array(
            'Article' => array(
                'title' => 'title'
            ),
            'Author' => array(
                'name' => 'foobar'
            ),

        ),
        array(
            'Article' => array(
                'title' => 'title2'
            ),
            'Author' => array(
                'name' => 'foobar'
            )
        ),
        /* array( */
        /*     'Article' => array( */
        /*         'title' => '' */
        /*     ), */
        /*     'Author' => array( */
        /*         'name' => '' // this breaks our validation */
        /*     ) */
        /* ), */
    );

    $db = $this->controller->Article->getDataSource();

    $db->begin();

    /*
     * We want to inform the user of _all_ validation messages, not one at a time
     */

    $validation_errors = array();

    /*
     * Do this by row count, so that user can look through their CSV file
     */

    $row_count = 1;

    foreach ($csv_data as &$row) {

        /*
         * If author already exists, don't create new record, but associate to existing
         */

        if (!empty($row['Author'])) {                
            $author = $this->controller->Author->find('first', 
                array(
                    'conditions' => array(
                        'name' => $row['Author']['name']
                    )
                ));

            if (!empty($author)) {
                $row['Author']['id'] = $author['Author']['id'];
            }
        }

        $this->controller->Article->saveAssociated($row, array('validate' => true));

        if (!empty($this->controller->Article->validationErrors)) {
            $validation_errors[$row_count] = $this->controller->Article->validationErrors;
        }            
        $row_count++;
    }


    if (empty($validation_errors)) {
        $db->commit();            
    } else {
        $db->rollback();
        debug($validation_errors);
    }

    debug($this->controller->Article->find('all'));

}