1

Say I have a movie table and a genre table. linked through a many to many relationship.

TABLE movie
id
name

_

TABLE movie_genre
movie_fk
genre_fk

_

TABLE genre
id
name

So nice and normalised. But I'm also also importing a CSV file into a table, which is of the format:

TABLE csv
name, genres 
'Die Hard', 'action~drama'

Now I want to check for changes, whether the csv has genres listed I don't have or vice versa so I display the changes to the user and later sync them

I'm doing it like so :

SELECT * FROM movie 
JOIN movie_genre ON movie.id = movie_genre.movie_fk 
JOIN genre ON genre.id = movie_genre.genre_fk 
WHERE 
FIND_IN_SET(genre.name, REPLACE(csv.genres, '~', ',')) = 0 

Issue is this will only flag changes one way. E.g. If in my database I have Die Hard related to genres: action, drama

and the CSV contains action,drama, horror

because each of the genres in my database are included in the csv data it won't get marked as change.

Please note the genres listed in the csv may not be listed in any particular order.

Hopefully I've explained that thoroughly enough.

How can I achieve what I'm trying to do ? Can it be done using REGEX or a custom MySQL function ?

Alan
  • 13
  • 3
  • Parse your CSV column to separate tokens (genres) then compare. The applicable parsing method is critically depends on precise version of your MySQL server. – Akina Oct 29 '20 at 09:53
  • Maybe you can find out the exact version of the server and let us know? – Akina Oct 29 '20 at 10:28
  • MySQL Version 5.7.27 – Alan Oct 29 '20 at 10:34
  • Too ancient... use iterative SP, parse your CSV to separate values into temporary table then select needed data. In MySQL 8+ it may be performed by one query - see [fiddle](https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=12bf372ba80415137de31820d884911f). – Akina Oct 29 '20 at 11:04

2 Answers2

0

FIND_IN_SET is to find whether an element is inside a set. You intend to find the elements that are missing from another table as well. So, instead of the not normalized CSV import result, I would modify the csv table to the form of

csv(movie_id, genre_id)

and generate import code which would insert it. So, you could check for differences as follows (untested code):

(
select movie_id, genre_id, "missing" as status
from csv
where not exists (select 1 from movie_genre where movie_genre.movie_fk = csv.movie_id and movie_genre.genre_fk = csv.genre_id)
)
union
(
select movie_fk as movie_id, genre_fk as genre_id, "surplus" as status
from movie_genre
where not exists (select 1 from csv where movie_genre.movie_fk = csv.movie_id and movie_genre.genre_fk = csv.genre_id);
)

You can perform changes as well:

insert into movie_genre(movie_fk, genre_fk)
select movie_id, genre_id
from csv
where not exists (select 1 from movie_genre where movie_genre.movie_fk = csv.movie_id and movie_genre.genre_fk = csv.genre_id)

and

delete
from movie_genre
where where not exists (select 1 from csv where movie_genre.movie_fk = csv.movie_id and movie_genre.genre_fk = csv.genre_id);
Lajos Arpad
  • 64,414
  • 37
  • 100
  • 175
  • unfortunately at this stage it's not really possible to change the structure of the csv table. in hindsight yes that would've probably worked better. – Alan Oct 29 '20 at 10:02
  • @Alan then you can proceed and solve half of the problem with the approach you have already outlined and implement a stored procedure which inserts normalized values into a temporary table and use that to solve the second half of the problem. – Lajos Arpad Oct 29 '20 at 10:40
  • @Alan further read: https://gist.github.com/duanehutchins/c6000b367b3032b0b495c46b3bc370c8 – Lajos Arpad Oct 29 '20 at 10:42
  • Also: https://www.sisense.com/blog/splitting-comma-separated-values-in-mysql/ – Lajos Arpad Oct 29 '20 at 10:43
  • And also: https://forums.mysql.com/read.php?10,635524,635529 – Lajos Arpad Oct 29 '20 at 10:44
0

If anyone's interested, not the most elegant solution but I solved this by doing the following:

  • firstly I updated the import code to ensure that the imported genre list column gets sorted into alphabetical order.
  • then I updated my query to do a direct string comparison on the result of a group_concat

_

SELECT * FROM movie 
LEFT JOIN 
( 
SELECT GROUP_CONCAT(genre.name ORDER BY genre.name SEPERATOR "~") AS genres, movie_genre.movie_fk FROM genre 
JOIN movie_genre ON genre.id = movie_genre.genre_fk 
GROUP BY movie_genre.movie_fk
) AS sub_genres ON movie.id = sub_genres.movie_fk
WHERE sub_genres.genres != csv.genres
Alan
  • 13
  • 3