2

I have a set of data in a MySQL database. I'm retrieving a list of a results ordered by a field called login. When I retrieve this set, two of the lines are in the following order.

cco1@blah.com
cco10.test@blah.com

However, when I compare them in Java (necessary because of the post-processing needed to merge objects application side), cco10.test@blah.com has a value less than cco1@blah.com. In other words, the String comparison would expect them to be in the following order.

cco10.test@blah.com
cco1@blah.com

As a whole, everything else is returned in the correct order. I assume the difference here is most likely a difference in the way Java and MySQL string comparison treats certain values. How do I get these to return in a consistent order (I'm ok with them being either way, just need the consistency).

Query I'm running:

select t0.id as envUserId , t0.environment_id as envId, t0.environment_name as envName,   t0.customer_name as customerName, t0.version version, t0.user_id as userId, t0.login as userLogin,   t0.sso_granted_roles as sso_granted_roles,   t1z_.role_name as defaultRole, t3.id as customRoleId, t3.name as customRoleName
  from environment_user t0   
  left join ( 
    select distinct eu.id, eu.login       
    from  environment_user eu                 
    left join environment_user_role eur on eu.id = eur.environment_user_id   
    left join environment_user_custom_role eucr on eu.id = eucr.environment_user_id   
    left join custom_role cr on eucr.custom_role_id = cr.id        
    where eu.environment_id = '5a83069a-70d2-4d0e-9847-c709725281c5'             
    and (eur.role_name in ('Role1','Role2') 
        or cr.name in ('Role1','Role2'))       
    order by eu.login limit 0, 200) f on t0.id = f.id   
  left outer join environment_user_role t1z_ on t1z_.environment_user_id = t0.id   
  left outer join environment_user_custom_role ct1z_ on ct1z_.environment_user_id = t0.id   
  left outer join custom_role t3 on t3.id = ct1z_.custom_role_id   
  where t0.environment_id = '5a83069a-70d2-4d0e-9847-c709725281c5'     
  and t0.id = f.id
  order by userLogin asc

What I'm getting back (extra lines above and below have been removed for clarity)

'c2ad9f82-e0d5-4f8d-a5fe-a2d72d901b98', '5a83069a-70d2-4d0e-9847-c709725281c5', 'SearchTestDomainEnv', 'SearchTestDomainCustomer', '1', '649ea0bc-dab7-4ad2-a534-546f9817e252', 'c9ca3e83-ccc6-4108-aee4-1bc41e6294ff@searchtestdomain.com', '0', 'Role1', NULL, NULL
'83313002-49a3-45f2-9013-e8dab15789d5', '5a83069a-70d2-4d0e-9847-c709725281c5', 'SearchTestDomainEnv', 'SearchTestDomainCustomer', '1', '40d5c22a-33f8-4a37-a4db-63e3709cfae7', 'ccc@searchtestdomain.com', '0', 'Role1', NULL, NULL
'5ba69c88-a773-4d5b-835d-c88688867d6a', '5a83069a-70d2-4d0e-9847-c709725281c5', 'SearchTestDomainEnv', 'SearchTestDomainCustomer', '1', '91a7609a-4809-4e27-9d6f-448ff62b38b3', 'cccc@searchtestdomain.com', '0', 'Role1', NULL, NULL
'6833a699-b5ca-46aa-8a53-23a6ef41e1f8', '5a83069a-70d2-4d0e-9847-c709725281c5', 'SearchTestDomainEnv', 'SearchTestDomainCustomer', '1', '718808fa-3799-457f-9cdb-88ef887e0492', 'cco1@searchtestdomain.com', '0', 'Role1', NULL, NULL
'c466c478-8a32-4926-9cde-06a40071ac85', '5a83069a-70d2-4d0e-9847-c709725281c5', 'SearchTestDomainEnv', 'SearchTestDomainCustomer', '1', '6282739d-76ea-4dbb-be5e-b7d64d3b3f3f', 'cco10.test@searchtestdomain.com', '0', 'Role1', NULL, NULL
'5b04d561-6c20-4703-aa96-f17eda0405b6', '5a83069a-70d2-4d0e-9847-c709725281c5', 'SearchTestDomainEnv', 'SearchTestDomainCustomer', '1', 'fb644427-46ab-42e4-8295-65a397409c0d', 'cd67848c-62fc-4cb3-ab7d-8b2d49709973@searchtestdomain.com', '0', 'Role1', NULL, NULL
'27116bed-a1a6-483c-9e7b-97158786245c', '5a83069a-70d2-4d0e-9847-c709725281c5', 'SearchTestDomainEnv', 'SearchTestDomainCustomer', '1', '246f392d-6d27-402e-837f-98384da0abb6', 'd1072b14-2956-432e-9243-72b68275fbe6@searchtestdomain.com', '0', 'Role1', NULL, NULL
kfan
  • 46
  • 1
  • 11
  • 2
    Read MySQL information about how they store / sort Strings and write in Java a `Comparator` that implements that same logic. – SJuan76 Jun 21 '16 at 00:26
  • Can you provide a SQL Fiddle that represents your experience? Because I am unable to reproduce the SQL ordering that you describe (unless you are ordering in descending order): http://sqlfiddle.com/#!9/9eecb7d/65180 – nasukkin Jun 21 '16 at 00:31
  • Can't provide a SQLFiddle without copying over the majority of our schema (which I'm not allowed to do). I added some more information, hopefully that helps. – kfan Jun 21 '16 at 00:57
  • What is returned by your server when you execute `SELECT 'cco10.test@blah.com' < 'cco1@blah.com'`? – Tung Jun 21 '16 at 01:03
  • The order of records from an `ORDER BY` depend on the Collation your database is using. Which one are you using? – Andreas Jun 21 '16 at 01:21
  • @Tung I get 1 back. – kfan Jun 21 '16 at 17:01
  • @Andreas Looks like it's utf8_unicode_ci – kfan Jun 21 '16 at 17:02

3 Answers3

3

What I have found out is that:

1) When I make query in MySQL, I get the result as follows:

enter image description here

2) Following Test.java attempts to find the order of two Strings as follows:

import java.util.Arrays;

public class Test {
  public static void main(String[] args) {   
      String[] arr={"cco1@blah.com", "cco10.test@blah.com"};
      Arrays.sort(arr);   
      System.out.println(Arrays.toString(arr));
  }
}

and the output is:

enter image description here

3) To see the collation, following query is executed:

SELECT table_catalog,
       table_schema,
       table_name,
       column_name,
       collation_name
FROM   information_schema.columns
WHERE  table_schema = 'test'
       AND column_name = 'email'; 

And the output is:

enter image description here

So we see that both MySQL and Java sort the two Strings in the same order when MySQL collation is utf8_general_ci.

Sanjeev Saha
  • 2,632
  • 1
  • 12
  • 19
0

See the comment on String.compareTo()

/**
     * Compares two strings lexicographically.
     * The comparison is based on the Unicode value of each character in
     * the strings. The character sequence represented by this
     * {@code String} object is compared lexicographically to the
     * character sequence represented by the argument string. The result is
     * a negative integer if this {@code String} object
     * lexicographically precedes the argument string. The result is a
     * positive integer if this {@code String} object lexicographically
     * follows the argument string. The result is zero if the strings
     * are equal; {@code compareTo} returns {@code 0} exactly when
     * the {@link #equals(Object)} method would return {@code true}.
     * <p>
     * This is the definition of lexicographic ordering. If two strings are
     * different, then either they have different characters at some index
     * that is a valid index for both strings, or their lengths are different,
     * or both. If they have different characters at one or more index
     * positions, let <i>k</i> be the smallest such index; then the string
     * whose character at position <i>k</i> has the smaller value, as
     * determined by using the &lt; operator, lexicographically precedes the
     * other string. In this case, {@code compareTo} returns the
     * difference of the two character values at position {@code k} in
     * the two string -- that is, the value:
     * <blockquote><pre>
     * this.charAt(k)-anotherString.charAt(k)
     * </pre></blockquote>
     * If there is no index position at which they differ, then the shorter
     * string lexicographically precedes the longer string. In this case,
     * {@code compareTo} returns the difference of the lengths of the
     * strings -- that is, the value:
     * <blockquote><pre>
     * this.length()-anotherString.length()
     * </pre></blockquote>
     *
     * @param   anotherString   the {@code String} to be compared.
     * @return  the value {@code 0} if the argument string is equal to
     *          this string; a value less than {@code 0} if this string
     *          is lexicographically less than the string argument; and a
     *          value greater than {@code 0} if this string is
     *          lexicographically greater than the string argument.
     */
    public int compareTo(String anotherString) {
        int len1 = value.length;
        int len2 = anotherString.value.length;
        int lim = Math.min(len1, len2);
        char v1[] = value;
        char v2[] = anotherString.value;

        int k = 0;
        while (k < lim) {
            char c1 = v1[k];
            char c2 = v2[k];
            if (c1 != c2) {
                return c1 - c2;
            }
            k++;
        }
        return len1 - len2;
    }

MySql do in similar way,but depend on the character set of DB. More information here: MYSQL

star
  • 321
  • 2
  • 9
  • I understand what lexicographical ordering is and how Java and MySQL use it to compare strings, I'm trying to figure out why I'm getting a difference here in the comparison. Judging by what people have been saying above, it most likely has to do with collation. – kfan Jun 21 '16 at 20:12
0

Found a solution in this Stack Overflow question. It looks like Java's lexicographic sort is not based off of natural language sort as implemented in utf8_unicode_ci. The solution here is to create a Collator and use the compareTo method of the Collator to perform our sort instead.

Community
  • 1
  • 1
kfan
  • 46
  • 1
  • 11