2

Been trying to remove a right single quotation mark from data coming into a Perl form unsuccessfully. If I paste the text: ( Can’t Be Dodged ) into the form it never removes the right single quotation mark. I've tried different methods of encoding and escaping the Unicode and nothing seems to work.

Below is what I'm working with.

#!/usr/bin/perl
use strict;
use CGI::Carp qw( fatalsToBrowser carpout);
use CGI '-utf8';
my $q = CGI->new;
my $buffer = $q->param( 'q' );
print "Content-Type: text/html; charset=UTF-8", "\n\n";
$buffer =~ s/[\'\`\.]//g;
$buffer =~ s/’//sg;
print "$buffer";
Nathan
  • 31
  • 4
  • 2
    In your example, what is initially in $buffer? – Charles Ma Jul 30 '11 at 11:49
  • Nothing is in $buffer. All data coming into $buffer is from an HTML form. – Nathan Jul 30 '11 at 11:51
  • Then what is an example of the output of $q->param( 'q' )? – Charles Ma Jul 30 '11 at 12:03
  • 2
    In other words, what's your input, what do you expect to be the output, and what is the actual output? The 3 things that make every question much easier to answer :) – Charles Ma Jul 30 '11 at 12:04
  • Just clarifying **[Charles Ma](http://stackoverflow.com/users/11708/charles-ma)**'s question: What does `$buffer` contain when the substitution operator acts on it? – Alan Haggai Alavi Jul 30 '11 at 12:05
  • When I submit "Can’t Be Dodged" into the script from an HTML form it becomes $buffer from the q param and returns the output: "Can�t Be Dodged" – Nathan Jul 30 '11 at 12:09

3 Answers3

1

I think you might like Text::Demoronise.

larsen
  • 1,431
  • 2
  • 14
  • 26
0

So, the trick is to figure out what the character is. One solution is to do something like this:

for my $c (split //, $buffer) {
    printf "[$c]: %x\n", ord $c;
}

Once you know what the character is, removing it is simple.

Chas. Owens
  • 64,182
  • 22
  • 135
  • 226
  • This is what I get back, not familiar with how to remove fffd as a character? :/ [C]: 43 [a]: 61 [n]: 6e [�]: fffd [t]: 74 [ ]: 20 [B]: 42 [e]: 65 [ ]: 20 [D]: 44 [o]: 6f [d]: 64 [g]: 67 [e]: 65 [d]: 64 – Nathan Jul 30 '11 at 12:26
  • `s/\x{fffd}//`, but `0xFFFD` is the replacement character, not a single quote character. It sounds like you have an encoding problem. – Chas. Owens Jul 30 '11 at 12:37
  • Thanks that does do it but maybe I should try to determine the encoding problem, which I'm honestly not sure where it would come from. – Nathan Jul 30 '11 at 12:46
0

I changed the substitution line to:

$buffer =~ s/[\'\’\.]//g;

This is my result from the command line:

$ ./test.pl q="( Can’t Be Dodged )"
Content-Type: text/html; charset=UTF-8

( Cant Be Dodged )
$ 

The same string but using an escaped left single quote produced the same result using the original code.

Edit

I've cleaned up the output - the command prompt is now just '$', and there is a newline added to the end of the output, this makes it easier to see the relevant output.

richj
  • 7,499
  • 3
  • 32
  • 50
  • I'm guessing it has something to do with how the data is being passed from the form to the script. But not exactly sure what. I'm sure it's something with how it's being encoded but not familiar enough to identify the cause. – Nathan Jul 30 '11 at 12:30