Unexpected behavior with Oracle ltrim

Question

select ltrim('PREPROCESSINGLIST_AOD', 'PREPROCESSINGLIST_') as node_code from dual;

The expected result is AOD, but when I run by changing the first string then getting unexpected behavior. Below are the trails I tried and run by changing the character after _ like BOD, COD...ZOD and below is weird result.

When    AOD Result==>   AOD
When    BOD Result==>   BOD
When    COD Result==>   D
When    EOD Result==>   D
When    FOD Result==>   FOD
When    GOD Result==>   D
When    HOD Result==>   HOD
When    IOD Result==>   D
When    JOD Result==>   JOD
When    KOD Result==>   KOD
When    LOD Result==>   D
When    MOD Result==>   MOD
When    NOD Result==>   D
When    OOD Result==>   D
When    POD Result==>   D
When    QOD Result==>   QOD
When    ROD Result==>   D
When    SOD Result==>   D
When    TOD Result==>   D
When    UOD Result==>   UOD
When    VOD Result==>   VOD
When    WOD Result==>   WOD
When    XOD Result==>   XOD
When    YOD Result==>   YOD
When    ZOD Result==>   ZOD

And... My question why is this weird behavior?

Lasse V. Karlsen · Answer 1 · 2018-06-29T13:03:10.127

4

It isn't weird behavior. It is documented behavior.

According to the documentation of LTRIM:

LTRIM removes from the left end of char all of the characters contained in set.

So the second parameter, though in a string type, is not a string as such, it is a set of characters to trim away.

So this:

ltrim('PREPROCESSINGLIST_COD', 'PREPROCESSINGLIST_')

will end up returning D because both C and O is in PREPROCESSINGLIST_:

PREPROCESSINGLIST_
     ^^
     here

_AOD, however, A is not in the set so the trimming stops there.

If you test you can see that you get exactly the same behavior from this:

ltrim('PREPROCESSINGLIST_COD', 'CEGILNOPRST_')

This should be the same characters as in PREPROCESSINGLIST_.

edited Jun 29 '18 at 13:03

answered Jun 29 '18 at 12:57

Lasse V. Karlsen

380,855
102
628
825

Thanks, if at all I just want the substring after `_`, what should be best to use? I know the answer mentioned by @Frank to use `regexp_replace`, but how efficient is to use `regexp_replace` – Suryaprakash Pisay Jun 29 '18 at 13:12
The reason I didn't post about regex_replace is that I'm no Oracle expert *at all*, I simply know from experience with similar functions in other systems what likely was the problem and reading the documentation told me it was, but I couldn't write an efficient way to do what you want if I was in a wet paper bag. Well, that came out wrong, but what I'm trying to say is that you should ask [@FrankSchmitt](https://stackoverflow.com/users/610979/frank-schmitt) about that :) – Lasse V. Karlsen Jun 29 '18 at 13:18

score 3 · Answer 2 · answered Jun 29 '18 at 13:01

3

The question why your code behaves this way has already been answered by Lasse Vågsæther Karlsen. To achieve what you want, you can use a regular expression:

select regexp_replace('PREPROCESSINGLIST_COD', 
                      '^PREPROCESSINGLIST_', 
                      '') as node_code 
from dual;

This will remove a leading PREPROCESSINGLIST from the string, but leave it there if it's in the middle of the string (because of the ^ anchor for beginning-of-string).

answered Jun 29 '18 at 13:01

Frank Schmitt

30,195
12
73
107

You can copy all the contents of my answer and I can delete mine, here I think it is actually more important to tell him how to get the behavior he wants instead of just explaining why his current code doesn't work. – Lasse V. Karlsen Jun 29 '18 at 13:04
2

@LasseVågsætherKarlsen I think it's important to first explain why the original code isn't working, so I'd prefer if we keep both answers (that's not entirely altruistic - I still want to get the Sportsmanship badge one day :-) ) – Frank Schmitt Jun 29 '18 at 13:10
@FrankSchmitt, Thanks I'm afraid to use `regexp_replace` because if at all there is the performance hit while using it. So I found `replace` clause will do the same job as well, Please suggest which one to choose. – Suryaprakash Pisay Jun 29 '18 at 13:18
1

@SuryaprakashPisay - the basic `replace` will match in the middle of the string; as Frank mentioned the `^` anchor in the regex pattern means that only matches at the start. SO which you use partly depends on whether that matters. You could also use `instr`/`substr` but that's not as simple (but may be faster). – Alex Poole Jun 29 '18 at 13:40

score 3 · Answer 3 · answered Jun 29 '18 at 14:05

if at all I just want the substring after _, what should be best to use?

It isn't quite clear if you want to only remove the exact string 'PREPROCESSINGLIST_', and if so whether that should only be matched at the start of the string or anywhere; or you want to remove anything up to the first underscore, or anything up to any underscore.

Depending on your actual data and the result you want to get, you can use regexp_replace() as @FrankScmitt showed (with or without an anchor), or a plain replace(), or a combination of instr() and substr().

With some made-up data with various patterns provided in a CTE:

with t (str) as (
            select 'PREPROCESSINGLIST_AOD' from dual
  union all select 'PREPROCESSINGLIST_BOD' from dual
  union all select 'PREPROCESSINGLIST_COD' from dual
  union all select 'PREPROCESSINGLIST_DOD' from dual
  union all select 'XYZ_PREPROCESSINGLIST_EOD' from dual
  union all select 'XYZ_FOD' from dual
  union all select 'ABC_XYZ_GOD' from dual
  union all select 'HOD' from dual
)
select str,
  regexp_replace(str, '^PREPROCESSINGLIST_', null) as anchor_regex,
  regexp_replace(str, 'PREPROCESSINGLIST_', null) as free_regex,
  replace(str, 'PREPROCESSINGLIST_', null) as free_replace,
  case when instr(str, '_') > 0 then substr(str, instr(str, '_') + 1) else str end
    as first_underscore,
  case when instr(str, '_') > 0 then substr(str, instr(str, '_', -1) + 1) else str end
    as last_underscore
from t;

STR                       ANCHOR_REGEX              FREE_REGEX  FREE_REPLAC FIRST_UNDERSCORE      LAST_UNDERS
------------------------- ------------------------- ----------- ----------- --------------------- -----------
PREPROCESSINGLIST_AOD     AOD                       AOD         AOD         AOD                   AOD        
PREPROCESSINGLIST_BOD     BOD                       BOD         BOD         BOD                   BOD        
PREPROCESSINGLIST_COD     COD                       COD         COD         COD                   COD        
PREPROCESSINGLIST_DOD     DOD                       DOD         DOD         DOD                   DOD        
XYZ_PREPROCESSINGLIST_EOD XYZ_PREPROCESSINGLIST_EOD XYZ_EOD     XYZ_EOD     PREPROCESSINGLIST_EOD EOD        
XYZ_FOD                   XYZ_FOD                   XYZ_FOD     XYZ_FOD     FOD                   FOD        
ABC_XYZ_GOD               ABC_XYZ_GOD               ABC_XYZ_GOD ABC_XYZ_GOD XYZ_GOD               GOD        
HOD                       HOD                       HOD         HOD         HOD                   HOD

If you can get the result you need in more than one way then it is generally more efficient to avoid regular expressions, but sometimes they are the only (sane) choice. As always it's best to test the options yourself against your actual data to see what is most efficient - or at least efficient enough.

Unexpected behavior with Oracle ltrim

3 Answers3