0

The raw string is like this:

{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\froman\fcharset0 Times New Roman;}{\f1\fnil\fcharset0 MS Shell Dlg 2;}}
\viewkind4\uc1\pard\sb100\sa100\f0\fs24\u30340?\u27494?\u35013?\u20998?\u23376?\u65292?23\u26085?\u22312?\u33778?\u24459?\u23486?\u21335?\u37096?\u30340?\u39532?\u20140?\par
\pard\f1\fs17\par
by: lena (11/26/09)\par
\par
}

What is the regex pattern that would replace all RTF tags following a slash with "" empty string except \unumbers? The result should look like:

\u30340?\u27494?\u35013?\u20998?\u23376?\u65292?23\u26085?\u22312?\u33778?\u24459?\u23486?\u21335?\u37096?\u30340?\u39532?\u20140?
by: lena (11/26/09)

I tried "\\\\\\w+|\\{.*?\\}|\\}" which removes all that follows a backslash and all curly braces. The missing part is something like \\!(\\\\u)

Zombo
  • 1
  • 62
  • 391
  • 407
val
  • 151
  • 1
  • 10

1 Answers1

0

Try matching the tags you want to keep first and replace them.

# php
$str = preg_replace('/(\\\u[\d]+)|\\\+[\w\?]+|{.*?}/', '$1', $str);

# perl
$str =~ s/(\\\u[\d]+)|\\\+[\w\?]+|{.*?}/$1/g;
Rob
  • 8,042
  • 3
  • 35
  • 37
  • I meant replacing them with themselves. The first match `(\\\u[\d]+)` is the \u tags you want to keep, which is the replacement $1. – Rob Nov 27 '09 at 03:42
  • sorry confused a bit: if i coding in c++ and not familiar with php or perl that much. – val Nov 27 '09 at 03:42