0

I'm having trouble with invalid chars that are created when concatinating two cells into one in Oracle 10g.

The string is then inserted into an xml file, and if an invalid char is inserted it breaks the xml file. (The XML can be upto 8gb, and is a pain to fix, so I need to automate this to run before the xml is created daily.

What I've been doing to find one version of this is the following.

select * from (select ar_journal.jnlinpnum,debtor_ref,dump(narration) as dn,narration from ENERGYDB.ar_journal) where dn like '%31%' and dn not like '%=31:%';

However, this only brings back one char, and it occurs for a lot of chars, chars 0 - 31 and 226 - 256 need to be removed.

so I tried this,

create table tmp_charprefix(
charf1 Varchar2 (10), charf2 varchar2 (10)) ;


INSERT INTO tmp_charprefix (charf1, charf2) VALUES ('%27%', '%=27%');
INSERT INTO tmp_charprefix (charf1, charf2) VALUES ('%28%', '%=28%');
INSERT INTO tmp_charprefix (charf1, charf2) VALUES ('%29%', '%=29%');
INSERT INTO tmp_charprefix (charf1, charf2) VALUES ('%30%', '%=30%');
INSERT INTO tmp_charprefix (charf1, charf2) VALUES ('%31%', '%=31%');
INSERT INTO tmp_charprefix (charf1, charf2) VALUES ('%60%', '%=60%');
INSERT INTO tmp_charprefix (charf1, charf2) VALUES ('%62%', '%=62%');

The above are my values to look for.

create table tmp_charfix as
select aj.jnlinpnum, aj.type_jnl, ajt.jnl_descr, ajei.reason_no, ar.reason_descr, aj.narration
from ar_journal aj, ar_jnl_type ajt, ar_jnl_extra_info ajei, ar_reason ar
where ajt.type_jnl = aj.type_jnl
and ajei.jnlinpnum = aj.jnlinpnum
and ajei.reason_no = ar.reason_no
and aj.jnlinpnum in ( select jnlinpnum from (
select ar_journal.jnlinpnum, debtor_ref, dump(narration) as dn, narration from energydb.ar_journal)
where dn like (select charf1 from tmp_charprefix tmo)
and dn not like (select charf2 from tmp_charprefix tmt)
and substr(tmo.charf1,2,8) = substr(tmt.charf2,3,8) ;

-- trying to call a substring of charf1 and match it against a substring of charf2

I'm getting the following error, but i'm not sure if i'm doing something thats even possible on the last line

SQL Error: ORA-00907: missing right parenthesis
00907. 00000 -  "missing right parenthesis"
*Cause:    
*Action:

Any help would be greatly appreciated, i'm stuck on this for the last few hours.

Thanks,

Ben

Ben Coughlan
  • 555
  • 3
  • 12
  • 23
  • you might want to consider using a strem editor for automatically postprocessing the generated xml, eg. [sed](http://www.gnu.org/software/sed/). for simple substitutions based on regular expressions this solution might give you the edge performance-wise. – collapsar Apr 12 '13 at 17:28

2 Answers2

0

This is a suggestion rather than an answer, but it's too long to fit in the comments so I have to present it as an answer.

Suggestion: Indent your code. It will make it a lot easier to spot the problem.

create table tmp_charfix as
  select
    aj.jnlinpnum,
    aj.type_jnl,
    ajt.jnl_descr,
    ajei.reason_no,
    ar.reason_descr,
    aj.narration
  from
    ar_journal aj,
    ar_jnl_type ajt,
    ar_jnl_extra_info ajei,
    ar_reason ar
  where ajt.type_jnl = aj.type_jnl
    and ajei.jnlinpnum = aj.jnlinpnum
    and ajei.reason_no = ar.reason_no
    and aj.jnlinpnum in ( -- OPEN PAREN HERE WITH NO CLOSING PAREN
      select jnlinpnum from (
        select ar_journal.jnlinpnum, debtor_ref, dump(narration) as dn, narration
        from energydb.ar_journal
      ) -- this parenthesis is also trouble, coming between FROM and WHERE
      where dn like (select charf1 from tmp_charprefix tmo)
        and dn not like (select charf2 from tmp_charprefix tmt)
        and substr(tmo.charf1,2,8) = substr(tmt.charf2,3,8)
    -- Oracle is still waiting for a closing parenthesis here...
Ed Gibbs
  • 25,924
  • 4
  • 46
  • 69
0

to select the relevant records use

    select jnlinpnum
         , debtor_ref
         , dn
         , narration
      from (
                select ar_journal.jnlinpnum
                     , debtor_ref
                     , dump(narration) as dn
                     , narration
                     , regexp_instr ( dn, '([^[:digit:]=]|^)(27|28|29|30|31|60|62)([^:[:digit:]]|$)' ) as dn_occurrence
                  from ENERGYDB.ar_journal
           )
     where dn_occurrence > 0
         ;

the actual replacement can be performed with the regexp_replace function. check the performance, however, 8gb output might be prohibitive for using the regexp_class of functions.

collapsar
  • 17,010
  • 4
  • 35
  • 61