4

I'm selectively fixing some elements and attributes. Unfortunately, our input files contain both single- and double-quoted attribute values. Also, some attribute values contain quotes (within a value).

Using XML::Twig, I cannot see out how to preserve whatever quotes exist around attribute values.

Here's sample code:

use strict;
use XML::Twig;

my $file=qq(<file>
  <label1 attr='This "works"!' />
  <label2 attr="This 'works'!" />
</file>
);

my $fixes=0; # count fixes
my $twig = XML::Twig->new( twig_handlers => { 
                             '[@attr]' => sub {fix_att(@_,\$fixes);} },
                             # ...
                           keep_atts_order => 1,
                           keep_spaces => 1,
                           keep_encoding => 1, );
#$twig->set_quote('single');

$twig->parse($file);
print $twig->sprint();

sub fix_att {
  my ($t,$elt,$fixes) =@_;
  # ...
}

The above code returns invalid XML for label1:

<label1 attr="This "works"!" />

If I add:

$twig->set_quote('single');

Then we would see invalid XML for label2:

<label2 attr='This 'works'!' />

Is there an option to preserve existing quotes? Or is there a better approach for selectively fixing twigs?

ALF
  • 85
  • 4
  • Problem still exists in 3.44. As a workaround, I added an extra twig_handler to change all double quotes inside attribute values to single quotes: `'*' => sub {my ($t,$elt) =@_; foreach (keys %{$elt->atts}) {${$elt->atts}{$_} =~ s/\"/\'/g;}},` – ALF Jun 06 '13 at 17:51

1 Answers1

2

Is there any specific reason for you to use keep_encoding? Without it the quote is properly encoded.

keep_encoding is used to preserve the original encoding of the file, but there are other ways to do this. It was used mostly in the pre-5.8 era, when encodings didn't work as smoothly as they do now.

mirod
  • 15,923
  • 3
  • 45
  • 65
  • I do need to preserve the original encoding. Mirod, can you suggest an alternative method to preserve the encoding? Thanks. – ALF Jun 07 '13 at 16:43
  • try doing `binmode STDOUT, sprintf( "encoding( :%s)", $twig->encoding);` before printing the twig. This should set the encoding of STDOUT to the proper value. Test it though, since I am not 100% sure that all XML encodings are supported by Perl. – mirod Jun 07 '13 at 19:28