2

I need to find the frequency of a string in a column, irrespective of its case and any white spaces.

For example, if my string is My Tec Bits and they occur in my table like this, as shown below :

061  MYTECBITS    12123
102  mytecbits    24324
103  MY TEC BITS  23432
247  my tec bits  23243
355  My Tec Bits  23424
454  My Tec BitS  23432

Then, the output should be 6, because, with whites pace removed and irrespective of case, all those strings are identical.

Is there any grep() equivalent in SQL as there is in R?

sunitprasad1
  • 768
  • 2
  • 12
  • 28
  • 1
    Are you using MySQL or Oracle? You shouldn't add both tags unless you're using both databases. – Barmar Apr 29 '15 at 10:20
  • Note that REGEXP is case insensitive by default in MySQL. – Casimir et Hippolyte Apr 29 '15 at 11:05
  • @CasimiretHippolyte - the case sensitivity depends on the `COLLATION` of the column being tested. – Rick James Apr 29 '15 at 14:07
  • @RickJames: I can only be a problem with binary collations, but in this case you can artificially change it with the `COLLATE` keyword for the string search. But as I said by default `SELECT "a" REGEXP "A";` returns 1 – Casimir et Hippolyte Apr 29 '15 at 14:36
  • Good example. But ... `SET @@collation_connection = utf8_bin; SELECT "a" REGEXP "A";` returns 0. Hence, I claim that it is collation, regexp, that decides whether to case fold. – Rick James Apr 29 '15 at 16:11

6 Answers6

2

If your concern is only with the SPACE and the CASE, then you need two functions:

  • REPLACE
  • UPPER/LOWER

For example,

SQL> WITH DATA AS(
  2  SELECT 'MYTECBITS' STR FROM DUAL UNION ALL
  3  SELECT 'mytecbits' STR FROM DUAL UNION ALL
  4  SELECT 'MY TEC BITS' STR FROM DUAL UNION ALL
  5  SELECT 'my tec bits' STR FROM DUAL UNION ALL
  6  SELECT 'MY TEC BITS' STR FROM DUAL UNION ALL
  7  SELECT 'MY TEC BITS' STR FROM DUAL
  8  )
  9  SELECT UPPER(REPLACE(STR, ' ', '')) FROM DATA
 10  /

UPPER(REPLA
-----------
MYTECBITS
MYTECBITS
MYTECBITS
MYTECBITS
MYTECBITS
MYTECBITS

6 rows selected.

SQL>

Then, the output should be 6

So, based on that, you need to use it in the filter predicate and COUNT(*) the rows returned:

SQL> WITH DATA AS(
  2  SELECT 'MYTECBITS' STR FROM DUAL UNION ALL
  3  SELECT 'mytecbits' STR FROM DUAL UNION ALL
  4  SELECT 'MY TEC BITS' STR FROM DUAL UNION ALL
  5  SELECT 'my tec bits' STR FROM DUAL UNION ALL
  6  SELECT 'MY TEC BITS' STR FROM DUAL UNION ALL
  7  SELECT 'MY TEC BITS' STR FROM DUAL
  8  )
  9  SELECT COUNT(*) FROM DATA
 10  WHERE UPPER(REPLACE(STR, ' ', '')) = 'MYTECBITS'
 11  /

  COUNT(*)
----------
         6

SQL>

NOTE The WITH clause is only to build the sample table for demonstration purpose. In our actual query, remove the entire WITH part, and use your actual table_name in the FROM clause.

So, you just need to do:

SELECT COUNT(*) FROM YOUR_TABLE 
  WHERE UPPER(REPLACE(STR, ' ', '')) = 'MYTECBITS'
/
Lalit Kumar B
  • 47,486
  • 13
  • 97
  • 124
  • I think using `WITH DATA AS` overcomplicates your answer. – Eilidh Apr 29 '15 at 10:34
  • @Eilidh So what to do when OP doesn't provide any sample data. `WITH` clause here is only to build the table to demonstrate. Don't get confused. We use the WITH clause for demonstrations when the sample data is not available. Is this the reason you have downvoted? – Lalit Kumar B Apr 29 '15 at 10:35
  • I am not confused, I know why you used the `WITH` clause, but the OP stated they are very new to SQL and showing this in your answer is unnecessary to the OP's question and obfuscates the important parts. – Eilidh Apr 29 '15 at 10:37
  • @Eilidh The important parts have been mentioned as bullet points on top. Anyway, I am adding a NOTE in the end about the WITH clause, if that makes you comfortable. – Lalit Kumar B Apr 29 '15 at 10:39
  • Apologies for not mentioning the table names and creating a mess. Put it in a very lucid way. Also @christian-barron 's answer has the exact thing that I was looking for.. – sunitprasad1 Apr 29 '15 at 11:06
  • @sunitprasad1 No problem, in future help us with some table data :-) Community learning is great. Yes, christian Barron updated his answer and now seems similar. – Lalit Kumar B Apr 29 '15 at 11:07
  • `SELECT ... FROM ( SELECT ... UNION ... ) AS x WHERE ...` is a MySQL-friendly way to do this instance of `WITH`. (And it is nearly an identical number of keystrokes.) – Rick James Apr 29 '15 at 14:10
  • @RickJames I understand. Well, it is not only applicable in MySQL, but also in Oracle. It's just that WITH clause looks subtle and easily understood. – Lalit Kumar B Apr 29 '15 at 14:13
1

You could use something like

UPPER(REPLACE(userString, ' ', ''))

to check for upper case only and to remove white space.

Coloco
  • 353
  • 4
  • 14
1

You could cast your statements to LOWER() before comparing them eg.

LOWER(column_name) = LOWER(variable)

more specific:

LOWER(First_name) = LOWER('JoHn DoE')

would become first name = 'john doe'

For the spacing you should use replace, the format for that is:

REPLACE(yourstring, ' ' , '')

' ' = a space character replace it by an empty string = ''

So you would do

WHERE LOWER(REPLACE(fieldname, ' ', '') = 'mytecbits'
Jeremy C.
  • 2,405
  • 1
  • 12
  • 28
  • If the `COLLATION` is "..._ci", the `LOWER()` is unnecessary, and would defeat the use of an index. – Rick James Apr 29 '15 at 14:12
  • To be honest I had to look up what a collation is exactly, but that would mean you wouldn't be able to check case sensitivity right (I'm going to make an educated guess here and assume ci stands for case insensitive) which you might want to do later on, or is the collation something you can set for every single query? – Jeremy C. Apr 29 '15 at 14:23
  • Collation is _usually_ defaulted for the column, then forgotten about. You _can_ have a `COLLATION ...` clause on a comparison, but that is very rare. My point to you (and everyone else in this thread) is that the collation of `fieldname` _probably_ obviates the need for `LOWER()`. _Most_ people confuse "character set" and "collation". – Rick James Apr 29 '15 at 16:16
1

You need to use count to bring back the number affected, Lower will place the data into lower case so that when you make a comparison you can make it lower case.

To remove spaces you then use Replace and replace the space with an empty string for your comparison:

Select COUNT(ColumnA) 
from table 
where Lower(Replace(ColumnB, ' ', '')) = 'mytecbits'
Christian Barron
  • 2,695
  • 1
  • 14
  • 22
1

If you are looking for the number of instances of one specific string, irrespective of case / whitespace, then you need to do the following -

  • ignore whitespace

  • ignore case

  • count the number of instances of the string

So you want a query like the following -

SELECT 
    COUNT(field)
FROM
    table
WHERE
    UPPERCASE(REPLACE(field, ' ', '')) = UPPERCASE(REPLACE(userstring, ' ', ''))

This counts the number of rows in your table where field is the same as the userstring, when case is ignored (all set to the same case using UPPERCASE, so it is effecitvely ignored), and spaces are ignored (spaces are removed from the field and the userstring using REPLACE)

Eilidh
  • 1,270
  • 1
  • 13
  • 33
1

Since REGEXP is case insensitive, you can obtain a match by making the spaces optional, example:

SELECT count(field) FROM yourtable WHERE field REGEXP "MY *TEC *BITS";

Note: if needed, you can add a space or a [[:<:]] (word boundary) before "MY" and a space or a [[:>:]] after "BITS" to avoid false positive.

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125