1

I'm curious if joining tables on string fields should be done with the string fields being trimmed?

Let's say you have two identical strings in two different tables. You want to join the two tables on those two strings. So you will do something like this:

SELECT * 
  FROM table_primary
  LEFT 
  JOIN table_seconday 
    ON table_primary.string_one = table_seconday.string_two

My questions is, even though the strings are identical, should the string fields first be trimmed, and then compared with each other?

Are there best practices when joining on strings, or it's the usual comparison as with joining on integers?

Strawberry
  • 33,750
  • 13
  • 40
  • 57
escplat12
  • 2,191
  • 4
  • 22
  • 34
  • 2
    "Best practices" would avoid joining on strings - IOWs don't use strings as keys. But, to answer your question, it depends. If you consider leading/trailing spaces as something that doesn't matter, then use trim. If they are significant, don't use trim. – Sloan Thrasher Mar 01 '18 at 19:52
  • Generally, joining on string values indicates a bad design, unless they are the primary keys or some code (e.g. Student ID). In the later case, you should not modify the values in any way otherwise the query will not use the indexes and the performance will suffer. – Racil Hilan Mar 01 '18 at 19:54
  • 1
    Wouldn't it be a better idea to trim the values on that column before inserting them? Or trim them if needed all at once. Like `UPDATE table_primary SET string_one = TRIM(string_one)`. – Anthony Mar 01 '18 at 20:04
  • As with any join, it would be good to have indexes on the columns. Don't use functions in the join, such as `trim(str1) = trim(str2)`, because that wouldn't make use of indexes anymore. Simply join the strings as you are doing and it should be fast. Yes, numeric keys may be a tad faster, but usually you shouldn't notice much of a difference, at least according to my experience. (I'm usually using Oracle, though, not MySQL.) – Thorsten Kettner Mar 01 '18 at 20:21
  • As to alphanumeric keys: there is nothing wrong with them. If you are using natural keys, ISO codes such as GB, US, and FR make a great key for a country table for instance. – Thorsten Kettner Mar 01 '18 at 20:25
  • Functions cannot use indexes , so this is never a good idea – Strawberry Mar 01 '18 at 22:19
  • @Strawberry *Functions cannot use indexes*, they actually can, if and only if the indexes are built on the exact same functions. Yes, it is still generally not a good idea to use functions and function-based indexes for joins and other critical query parts, but there are rare exceptions, so I wouldn't say *"never"*. – Racil Hilan Mar 02 '18 at 06:27
  • @RacilHilan interesting- can you provide an example? – Strawberry Mar 02 '18 at 06:33
  • @Strawberry It's not easy in comments. You can open a question if you want and I will answer it (or Gordon will beat me to it :)). Or you can easily google it. You can do that easily in some dbs like Oracle (simply use the function when creating the index), but other dbs like MySQL and SQL Server don't support that, so you'll need to use generated/calculated columns. Here is a related [question](https://stackoverflow.com/questions/10595037/is-it-possible-to-have-function-based-index-in-mysql) asking how it can be done in MySQL similar to Oracle. – Racil Hilan Mar 02 '18 at 06:50

3 Answers3

3

Generally, joining on string values indicates a bad design, unless they are the primary keys or some code (e.g. Student ID). In the later case, you should not modify the values in any way otherwise the query will not use the indexes and the performance will take a big hit.

If this is just a one off case and the performance isn't an issue, then you can technically do whatever gets you the required results. So if those strings may have some extra spaces, then you can trim them. Again, that indicates some data integrity issues. It is much better to validate and sanitize the strings before you insert them to the database in the first place. So for example, if you trim the strings before you insert them to the database, then you will not need to worry about it now and the performance will be better even if it still not using indexes because at least you will save the time that the trim function will take on the two sides.

Racil Hilan
  • 24,690
  • 13
  • 50
  • 55
1

Now that depend on your use case.

Yes if you have any chance of getting noisy data from applications.

No if you'll have proper data without any noise from applications.

so if you unnecessarily use trim your are decreasing performance i know that is negligible but still it's hit to server resources. and if you avoid and you have some noisy data you'll not get expected output.

Mihir Dave
  • 3,954
  • 1
  • 12
  • 28
0

You could try adding a checksum column in each of the tables and then joining on that.

tbean
  • 7
  • 2