1

How can query table1 with a list of values for colA and get back the the list of the values that were not found?

i.e. all_values = ["a","b","c","d"]

Table1 already has rows with colA for "b","c"

I want to create a query that can I send in the entire all_values list and get back new_values = ["a","d"]

Currently I have to do this by first getting all existing_values = SELECT colA FROM table1, then do new_values = all_values - existing_values at the application layer (ruby). But this can be an expensive routine if all_values grows very large. I want to leverage the db power here to tell me which values in my list don't exist already without needing to return first such a large list to the application layer.

[Solution] from David Faber's checked answer below, here is my solution from the application layer perspective (ruby on rails) in case anyone is interested. I've written the method so it can be included in any model you wish to use it on and accepts the column and list of values as arguments: [Update] parameterized the temp table name and setting datatype appropriately

    def self.unfound_values_by_column values, column
      quoted_values = values.collect {|value| "('#{value}')"}.join(",")
      conn = self.connection # get this model's connection if different than base
      temp_table_name = "temp_values_for_unfound_lookup_table"
      datatype = self.columns_hash[column].sql_type
      begin
        conn.execute(
          "DROP TEMPORARY TABLE IF EXISTS #{temp_table_name}"
        )
        conn.execute(
          "CREATE TEMPORARY TABLE #{temp_table_name} (`#{column}` #{datatype})"
          )
        conn.execute(
          "INSERT INTO #{temp_table_name} (#{column}) VALUES #{quoted_values}"
        )
        unfound = conn.select_values(
          "SELECT temp.`#{column}` FROM `#{temp_table_name}` temp 
            WHERE NOT EXISTS ( 
              SELECT 1 FROM `#{self.table_name}` my
                WHERE my.`#{column}` = temp.`#{column}`
            )"
        )
      ensure
        conn.execute(
          "DROP TEMPORARY TABLE IF EXISTS #{temp_table_name}"
        )
      end
     unfound
    end
Streamline
  • 2,040
  • 4
  • 37
  • 56

3 Answers3

1

Your best option is to create a temporary table (which will be visible only to the current session and dropped when that session ends), inserting all the values in all_values, then comparing that to the existing table. For example,

CREATE TEMPORARY TABLE `all_values`
(
    `myvalue` VARCHAR(30)
);

-- insert values

SELECT av.`myvalue`
  FROM `all_values` av
 WHERE NOT EXISTS ( SELECT 1 FROM `table1` t
                     WHERE t.`ColA` = av.`myvalue` );

I personally find WHERE EXISTS syntax easier than a LEFT JOIN with key = NULL, but YMMV.

David Faber
  • 12,277
  • 2
  • 29
  • 40
  • Perfect - I like and agree with your use of the WHERE [NOT] EXISTS. I added an edit to my post for my full solution at the ruby level based on your answer. – Streamline Jan 16 '15 at 21:37
0

You can use a derived table:

SELECT * FROM (
  SELECT "a" AS colA
  UNION ALL SELECT "b"
  UNION ALL SELECT "c"
  UNION ALL SELECT "d"
) all_values
WHERE all_values.colA COLLATE utf8_general_ci NOT IN (SELECT colA FROM table1);

I use COLLATE to avoid a collation conflict between table1 and values from all_values.

aimstone
  • 106
  • 7
  • Its not clear to me what you are doing with the UNION ALL SELECTs for each value in the list of all_values ... if that list was 50,000 values long then it would create a much much larger command string by adding UNION ALL SELECT to each each one... no? Is there a way to compress that? – Streamline Jan 16 '15 at 18:37
  • Unfortunately MySQL does not have an easy way of doing that. – David Faber Jan 16 '15 at 19:03
  • I tried to avoid a temporary table, which is the easiest way. For a long list, as you said, it could be an awkward solution :) – aimstone Jan 16 '15 at 19:21
0

I build a small example using LEFT JOIN. Table DATA hold your test data and table DATA_LIST all all posible data. The SELECT give all column that not in table DATA.

not_found_values: 'C', 'c'

/* your values */
CREATE TABLE data (
  col VARCHAR(10) NOT NULL
);


INSERT INTO data1 VALUES
( 'A'), ('a'), ('B'), ('b' );

/* hole data list */
CREATE TABLE data_list (
  col VARCHAR(10) NOT NULL
);


INSERT INTO data_list VALUES
( 'A'), ('a'), ('B'), ('b' ), ('C'),('c');

/* show all values that not found */
SELECT dl.col AS not_found_values
FROM
  data_list dl
LEFT JOIN data d
ON d.col = dl.col
WHERE
  d.col IS NULL
 ORDER BY dl.col;
  • Are one of these tables 'data' or 'data_list' supposed to a temporary table created on the fly? The list provided by the application should be considered temporary... no? – Streamline Jan 16 '15 at 18:40
  • Yes this is possible. In MySQL you can create temporary table on the fly. The are dropped if your session is gone. http://stackoverflow.com/questions/5859391/create-a-temporary-table-in-a-select-statement-without-a-separate-create-table –  Jan 16 '15 at 18:47
  • MySQL documentation about CREATE TEMPORARY TABLE https://dev.mysql.com/doc/refman/5.1/en/create-table.html –  Jan 16 '15 at 18:50