-1

I have a comment table and a comment_edit table, as well as olddb_edit. Simplified, the relevant table looks like this:

CREATE TABLE `olddb_edit` (
    edit_id INT NOT NULL,
    edit_time INT NOT NULL,
    edit_text TEXT NOT NULL,

    PRIMARY KEY (edit_id, edit_time)

) ENGINE=InnoDB;

And now I want to migrate content from another DB into the edit-table, but skip some table-rows like test-comments. I'm using CakePHP for this (Phinx, actually).

Usually, this suffices:

$skippable = array(
    12345, 23456, 34567, 45678,
);
$rows = $this->getQueryBuilder()
    ->select('*')
    ->from('olddb_comment')
    ->where(array(
        'comment_id NOT IN' => $skippable,
    ))
    ->execute()
    ->fetchAll('assoc')
;

But a simple NOT IN clause obviously doesn't work with compound-primary-keys.

I was thinking, the $skippable array should look like this:

$skippable = array(
    array('id' => 707969,   'time' => 1434462225),
    array('id' => 707969,   'time' => 1434462463),
    array('id' => 707969,   'time' => 1434462551),
);

And then I'll run the where clause through a for-loop or something. But to be honest, I don't even know how to do it in vanilla-MySQL.

There might be a solution posted on SO already, but I couldn't find any (other than ones specific to other applications). The algorithm is not my friend, I guess.

WoodrowShigeru
  • 1,418
  • 1
  • 18
  • 25
  • 1
    Please ask 1 specific researched non-duplicate question re where you are stuck. For code questions give a give a [mre]--cut & paste & runnable code, including smallest representative example input as code; desired & actual output (including verbatim error messages); tags & versions; clear specification & explanation. For SQL, DBMS & DDL (including constraints & indexes) & input as code formatted as a table. PS Keys are not needed to query. Joins are on conditions, not keys. IN for subquery takes a row, not a key. Whether it is multi-column is DBMS-specific. Typically EXISTS is a better choice. – philipxy May 16 '20 at 19:11

2 Answers2

0

Never mind, I figured it out myself while formulating the question. I'll post the answer anyway, for others with a similar question.


First, vanilla-MySQL. It's as intuitive as you would think if you break up the NOT IN (imo):

SELECT * FROM olddb_edit WHERE
NOT (
    (edit_id = 707969 AND edit_time = 1434461454)
OR  (edit_id = 707969 AND edit_time = 1434461503)
OR  (edit_id = 707969 AND edit_time = 1434461925)
);

And with the CakePHP / Phinx query-builder, you use an anonymous function, a for-loop and a not-or construct:

$qb = $this->getQueryBuilder()
    ->select('*')
    ->from('olddb_edit')
    ->where(array(
        'edit_some_other_optional_condition = 1',
    ))

    // also skip skippables.
    ->where(function($exp) use ($skippable) {
        $ORed = array();
        foreach ($skippable as $edit) {
            array_push($ORed, array(
                'edit_id'   => $edit['id'],
                'edit_time' => $edit['time'],
            ));
        }
        return $exp->not($exp->or_($ORed));
    })
;

UPDATE: Based on @ndm's comment, I present a satisfying solution using TupleComparison. — (@ndm, feel free to post your answer still if you want to. I will delete/edit mine and choose yours. You deserve the credit).

// remove the keys from my previous solution.
$skippable = array(
    array(707969,   1434462225),
    array(707969,   1434462463),
    array(707969,   1434462551),
);

$qb = $this->getQueryBuilder()
    ->select('*')
    ->from('olddb_edit')
    ->where(array(
        'edit_some_other_optional_condition = 1',
    ))

    // also skip skippables.
    ->where(new Cake\Database\Expression\TupleComparison(
        array('edit_id', 'edit_time'),
        $skippable,
        array('integer', 'integer'),
        'NOT IN'
    ))
;
WoodrowShigeru
  • 1,418
  • 1
  • 18
  • 25
  • 1
    In SQL, instead of unrolling the `NOT IN`, tuples would work too. Like: `(edit_id, edit_time) NOT IN ((707969, 1434461454), (707969, 1434461503), ...)` – sticky bit May 16 '20 at 19:11
  • @stickybit I find `OR NOT AND` very confusing, so I might be mistaken, but I *think* yours would be a logically different query (and, quite probably the one that the OP intended). – Strawberry May 16 '20 at 19:19
  • @Strawberry: Hmm, I don't know... First note that I reference the SQL in this answer, not anything from question and not any PHP stuff, if that wasn't clear. Then here's my reasoning: Everything in the parenthesis after `NOT` can be expressed as an `IN` with tuples. The `AND`s are covered by using tuples and then it's a bunch of `OR`s with the same left operand which can be expressed as an `IN`. So we get `NOT (... IN ...)`. And that can be simplified to `NOT IN ...`. But of course I might be wrong there somewhere. Maybe you can construct a counter example? – sticky bit May 16 '20 at 19:36
  • I didn't know that `NOT IN` supported tuples, or that this was a thing (outside of `INSERT` statements). I couldn't (and still can't) find any official docs about this on dev.mysql.com … I will look into this and see if there's a CakePHP thing for this. – WoodrowShigeru May 16 '20 at 22:14
  • @Strawberry I don't see the difference that you're mentioning. – WoodrowShigeru May 16 '20 at 22:15
  • 1
    There's an expression for tuple comparisons: **https://stackoverflow.com/questions/37919141/query-builder-in-clause-with-composite-columns/37924543#37924543** – ndm May 18 '20 at 10:39
  • @ndm, That is *exactly* what I was searching for, and it works like a charm. Do you want to write a proper answer for this so that I can accept it? – WoodrowShigeru May 21 '20 at 16:22
0

Your query evaluates as follows; are you sure that's what you want?

select edit_id 
     , edit_time
     , edit_text 
  from olddb_edit 
 where 
     (
       (
         (edit_id <> 707969) 
      or (edit_time <> 1434461454)
       ) 
   and (
         (edit_id <> 707969) 
      or (edit_time <> 1434461503)
       ) 
   and (
         (edit_id <> 707969) 
      or (edit_time <> 1434461925)
        )
      );
Strawberry
  • 33,750
  • 13
  • 40
  • 57
  • It's an odd way of looking at it, but essentially: yes. That is what I want. Those table-rows that are – at the same time – the negation of each individual skippable row. That is what a `NOT IN` is, after all. – WoodrowShigeru May 16 '20 at 22:02