0

I'm still learning regexp in oracle and stuck with below error. Below is my sample code

SELECT DISTINCT COALESCE(TO_NUMBER(regexp_substr(USERNAME, '[^.]+', 1, 2)), ID) ID , 
                COALESCE(regexp_substr(USERNAME, '[^.]+', 1, 1), USERNAME) AS USERNAME 
  FROM logs;
ORA-01722: invalid number 
01722. 00000 -  "invalid number"
*Cause:    The specified number was invalid.
*Action:   Specify a valid number.

Table Data

Username                ID
Ravi.1234              1234
Krishna.12345          12345
Ravi.Krishna.1234567   1234567
R.Krishna.987          987
Ravi.K.567890          567890
R.Krish                123
Ravi                   456

Expected Output

ID             Username
1234            Ravi
12345           Krishna
1234567         Ravi.Krishna
987             R.Krishna
567890          Ravi.K

How to reframe the query to get the output needed. Can substr be used instead of regexp will it give desired output? This is used in oracle database not in sql. Thanks in advance.

Barbaros Özhan
  • 59,113
  • 10
  • 31
  • 55
  • Let me help you clarify your question. So: in the table data, you have two string columns. Username is made of one or more "parts" separated by dot. ID is a single "string" (no dots) - perhaps made up entirely of digits, but this may or may not be important. (For you to clarify!) Then: in the query, you want to return only those rows where the last "part" of username matches the ID. For those rows, you want to return the ID, and separately, the portion of the Username after removing the ID. Is all of this correct? –  Dec 09 '20 at 18:39
  • Then: Yes, all of this can be done with standard string functions and conditions (avoiding all regular expressions). Is that what you need? Or are you also, at the same time, using this as an opportunity to learn more about regular expressions? Obviously, the optimal solution (which does NOT use regular expressions!) will not help you with this last goal, of learning more about regexp. –  Dec 09 '20 at 18:41
  • Yes with are without regexp i want the data to be extracted. Thanks for your help. Let me try this. – What's In A Name Dec 09 '20 at 19:03

2 Answers2

0

If I understood your assignment correctly (see my comments under your question), here is how you can do this with standard string functions and conditions:

with
  table_data (username, id) as (
    select 'Ravi.1234'           , '1234'    from dual union all
    select 'Krishna.12345'       , '12345'   from dual union all
    select 'Ravi.Krishna.1234567', '1234567' from dual union all
    select 'R.Krishna.987'       , '987'     from dual union all
    select 'Ravi.K.567890'       , '567890'  from dual union all
    select 'R.Krish'             , '123'     from dual union all
    select 'Ravi'                , '456'     from dual
  )
select id, substr(username, 1, instr(username, '.', -1) - 1) as username
from   table_data
where  username like '%.' || id
;

ID      USERNAME            
------- --------------------
1234    Ravi                
12345   Krishna             
1234567 Ravi.Krishna        
987     R.Krishna           
567890  Ravi.K  

In the LIKE condition in the WHERE clause, % is a wildcard for "any string of any length, including zero"; that must be followed by a literal dot and then by the ID, and that must be the whole USERNAME string. In select, instr(username, '.', -1) finds the position of the "first" dot in username, but counting from the end and moving left - that is what the minus sign means.

With regular expression functions and conditions:

select id, regexp_substr(username, '^(.*)\.' || id || '$', 1, 1, null, 1) as username
from   table_data
where  regexp_like(username, '\.' || id || '$')
;

The sixth argument to regexp_substr means "the first substring enclosed in parentheses" (first "capture group" is the technical term).

0

I think REGEXP_REPLACE() would suit well for your case while filtering out the values having at least one digit. In the current case, you're trying to convert the second portions of the Username strings to number, but not all of them are numeric, the error raises due to this. Moreover, you can also extract the ID column from Username column. e.g. no need to hold seperate ID column within your original table.

Thus, consider using

SELECT TO_NUMBER( REGEXP_REPLACE(Username, '[^0-9]+') ) AS ID,
       RTRIM( REGEXP_REPLACE(Username, '[^.]+$'),'.') AS "Username"
  FROM logs
 WHERE REGEXP_LIKE(Username,'[0-9]')

the following option would be an alternative to above one by using less Regular expression

SELECT TO_NUMBER( SUBSTR( Username, INSTR(Username, '.',-1)+1, LENGTH( Username ) )) AS ID,
       SUBSTR( Username, 1, INSTR(Username, '.',-1)-1 ) AS "Username"
  FROM logs
 WHERE REGEXP_LIKE(Username,'[0-9]') 

Demo

Barbaros Özhan
  • 59,113
  • 10
  • 31
  • 55
  • This approach makes many implicit assumptions about the data (for example, that the username except for the ID part does not contain digits, etc.) It would be helpful if you could state all the conditions you need for this to work - conditions that, apparently, are all satisfied in the sample data, but may not be necessarily be satisfied in the real-life data. –  Dec 09 '20 at 19:13
  • Just as another example: your solution assumes that '`Ravi.123'` with id `'456'` should be returned (with id 456 and user name Ravi). I got the impression that the "123" must match the "456", otherwise the row should not produce any output. –  Dec 09 '20 at 19:15