I have a set of strings on which I am performing some operations to retrieve only the part of the string that I want for further operations.
I have used various substr, instr, regexp_replace etc and achieved for some of the strings in a huge list. I initially found that the string is separated using '-'(underscore) characters and the part of string I want starts after third instance of the delimeter.
But, when I researched further the format became clumsy and the operations I did are not giving the expected output for all the type of strings in the columns.
Please find the attached Oracle(11g) SQL code below and help me in optimizing it to retrieve the expected output every time regardless of the string format.
SELECT 'ABCD-012345-EFG-10vXRI47HU-1' AS MAIN_STRING,
(TRIM(REGEXP_REPLACE(SUBSTR('ABCD-012345-EFG-10vXRI47HU-1',INSTR('ABCD-012345-EFG-10vXRI47HU-1','-',1,3)+1),'[^0-9A-Za-z]',''))) AS FUNCTION_OUTPUT,
'10vXRI47HU' AS EXPECTED_OUTPUT FROM DUAL
UNION ALL
SELECT 'ABCD-012345-EFG-10zSD0U/2' AS MAIN_STRING,
(TRIM(REGEXP_REPLACE(SUBSTR('ABCD-012345-EFG-10zSD0U/2',INSTR('ABCD-012345-EFG-10zSD0U/2','-',1,3)+1),'[^0-9A-Za-z]',''))) AS FUNCTION_OUTPUT,
'10zSD0U' AS EXPECTED_OUTPUT FROM DUAL
UNION ALL
SELECT 'ABCD-012345-EFG-10ZsE8h -1' AS MAIN_STRING,
(TRIM(REGEXP_REPLACE(SUBSTR('ABCD-012345-EFG-10ZsE8h -1',INSTR('ABCD-012345-EFG-10ZsE8h -1','-',1,3)+1),'[^0-9A-Za-z]',''))) AS FUNCTION_OUTPUT,
'10ZsE8h' AS EXPECTED_OUTPUT FROM DUAL
UNION ALL
SELECT 'ABCD-012345-EFG- 10zSe9K ' AS MAIN_STRING,
(TRIM(REGEXP_REPLACE(SUBSTR('ABCD-012345-EFG- 10zSe9K ',INSTR('ABCD-012345-EFG- 10zSe9K ','-',1,3)+1),'[^0-9A-Za-z]',''))) AS FUNCTION_OUTPUT,
'10zSe9K' AS EXPECTED_OUTPUT FROM DUAL
UNION ALL
SELECT 'ABCD-012345-EFG-.10zSe9K' AS MAIN_STRING,
(TRIM(REGEXP_REPLACE(SUBSTR('ABCD-012345-EFG-.10zSe9K',INSTR('ABCD-012345-EFG-.10zSe9K','-',1,3)+1),'[^0-9A-Za-z]',''))) AS FUNCTION_OUTPUT,
'10zSe9K' AS EXPECTED_OUTPUT FROM DUAL
UNION ALL
SELECT 'ABCD-012345-EFG-10zSe9K_2' AS MAIN_STRING,
(TRIM(REGEXP_REPLACE(SUBSTR('ABCD-012345-EFG-10zSe9K_2',INSTR('ABCD-012345-EFG-10zSe9K_2','-',1,3)+1),'[^0-9A-Za-z]',''))) AS FUNCTION_OUTPUT,
'10zSe9K' AS EXPECTED_OUTPUT FROM DUAL
UNION ALL
SELECT 'ABCD-012345-EFG-10zSe9K.' AS MAIN_STRING,
(TRIM(REGEXP_REPLACE(SUBSTR('ABCD-012345-EFG-10zSe9K.',INSTR('ABCD-012345-EFG-10zSe9K.','-',1,3)+1),'[^0-9A-Za-z]',''))) AS FUNCTION_OUTPUT,
'10zSe9K' AS EXPECTED_OUTPUT FROM DUAL
UNION ALL
SELECT 'ABCD-012345--EFG-10zSe9K' AS MAIN_STRING,
(TRIM(REGEXP_REPLACE(SUBSTR('ABCD-012345--EFG-10zSe9K',INSTR('ABCD-012345--EFG-10zSe9K','-',1,3)+1),'[^0-9A-Za-z]',''))) AS FUNCTION_OUTPUT,
'10zSe9K' AS EXPECTED_OUTPUT FROM DUAL
UNION ALL
SELECT 'ABCD-012345-EFG-10zSe9K-' AS MAIN_STRING,
(TRIM(REGEXP_REPLACE(SUBSTR('ABCD-012345-EFG-10zSe9K-',INSTR('ABCD-012345-EFG-10zSe9K-','-',1,3)+1),'[^0-9A-Za-z]',''))) AS FUNCTION_OUTPUT,
'10zSe9K' AS EXPECTED_OUTPUT FROM DUAL
UNION ALL
SELECT 'ABCD-012345-EFG-10zSe9K-//1/23h' AS MAIN_STRING,
(TRIM(REGEXP_REPLACE(SUBSTR('ABCD-012345-EFG-10zSe9K-//1/23h',INSTR('ABCD-012345-EFG-10zSe9K-//1/23h','-',1,3)+1),'[^0-9A-Za-z]',''))) AS FUNCTION_OUTPUT,
'10zSe9K' AS EXPECTED_OUTPUT FROM DUAL
UNION ALL
SELECT 'ABCD-EFGH-HIJK-012345-10zSe9K' AS MAIN_STRING,
(TRIM(REGEXP_REPLACE(SUBSTR('ABCD-EFGH-HIJK-012345-10zSe9K',INSTR('ABCD-EFGH-HIJK-012345-10zSe9K','-',1,3)+1),'[^0-9A-Za-z]',''))) AS FUNCTION_OUTPUT,
'10zSe9K' AS EXPECTED_OUTPUT FROM DUAL
The output of the above query looks as below:
MAIN_STRING FUNCTION_OUTPUT EXPECTED_OUTPUT
ABCD-012345-EFG-10vXRI47HU-1 10vXRI47HU1 10vXRI47HU
ABCD-012345-EFG-10zSD0U/2 10zSD0U2 10zSD0U
ABCD-012345-EFG-10ZsE8h -1 10ZsE8h1 10ZsE8h
ABCD-012345-EFG- 10zSe9K 10zSe9K 10zSe9K
ABCD-012345-EFG-.10zSe9K 10zSe9K 10zSe9K
ABCD-012345-EFG-10zSe9K_2 10zSe9K2 10zSe9K
ABCD-012345-EFG-10zSe9K. 10zSe9K 10zSe9K
ABCD-012345--EFG-10zSe9K EFG10zSe9K 10zSe9K
ABCD-012345-EFG-10zSe9K- 10zSe9K 10zSe9K
ABCD-012345-EFG-10zSe9K-//1/23h 10zSe9K123h 10zSe9K
ABCD-EFGH-HIJK-012345-10zSe9K 01234510zSe9K 10zSe9K
You can observe there are differences between function_output and expected_output in some cases.
Note: The expected output might not always start with -10. It can be any alpha numeric value.
The only rule for this assignment is to extract only the expected output from the strings. The string formats can get too clumsy. I am trying to find a way to get the expected output everytime irrespective of the string format and delimeters available.
I have asked if the expected output can be of only certain number of characters everytime but I still did not get the answer for that.
But based on the data I have seen, it can start with 7 alpha numeric characters and go upto 12 alpha numeric characters. For that reason, I could not use a range of characters in the substring function.
So, I could not limit the substr function to certain length.
I would like to remove the differences and make the function_output match with expected_output every time.
Any help to achieve the expected output every time is highly appreciated.
Thanks In Advance.
Please let me know if you need anymore details.