-2

I work with Notepad++ and Excel. I have data that contains text in English and Chinese.

The data structure is as follows:

<p> chinese text</p>
<p> english text</p>
<p> chinese text</p>
<p> english text</p>
<p> chinese text</p>
<p> english text</p>

How to delete all English text and also symbols between < p> and < /p> ?

So just leave the Chinese text between < p> and < /p>

So the result is like this:

<p> chinese text</p>
<p> chinese text</p>
<p> chinese text</p>

I tried to delete English text by removing ascii characters using regex, but there is an English text that was missed.

See this pic: PIC Please help me, thanks

elevaku
  • 7
  • 2
  • Data Example: `

    坐在马车上的叶天,通过窗户看向远处雄伟的天炎城,一时间不由苦笑。

    Ye Tian, ​​sitting in the carriage

    至于原因,小糯米跟纪小小根本受不了天炎城那一百多度的高温,此时已经热的快要中暑了。

    As for the reason, Xiaomi and Ji Xiaoxiao couldn't stand has a heatstroke.

    小金吼倒是没事,趴在叶天脚下睡的深沉。

    Xiao Jin’s squatting is fine, and he is sleeping deep at the foot of Ye Tian.

    “少爷,我受不了啦!”小糯米眼见马车每前行一步,温度就会上升几度,连向叶天求救。

    ""Young master, I can't stand it!"" .

    `
    – elevaku Apr 21 '20 at 05:38

4 Answers4

0

You should be able to do this using Notepad++:

  • replace <p>[a-zA-Z"].*$ to empty string (regex replace mode)
  • replace \n\n to \n (extended replace mode)
  • replace <p>|</p> to empty string (regex replace mode)
Patrick Artner
  • 50,409
  • 9
  • 43
  • 69
  • it works on paragraphs after the

    tag has spaces. But, if the sentence with the

    tag with the next sentence has no space, then it only works on the first sentence.
    – elevaku Apr 21 '20 at 06:23
0

Try this: https://regex101.com/r/TGrW27/1

This regex will basically match and remove:

  • <p>: Match the starting tag
  • (\w|"|'): Match any English letter or " or '
  • .+: Any number of times
  • <\/p>: And watch for closing tag
adelriosantiago
  • 7,762
  • 7
  • 38
  • 71
  • it works on paragraphs after the

    tag has spaces. But, if the sentence with the

    tag with the next sentence has no space, then it only works on the first sentence. check https://regex101.com/r/TGrW27/2
    – elevaku Apr 21 '20 at 06:24
0

Most of the above solutions only work in the first paragraph if there is a paragraph model that extends 1 line. When I try, this doesn't work for paragraphs that extend one line

enter image description here

elevaku
  • 7
  • 2
0

If your data is always as first line Chinese and second line English? Then you can solve this problem using below technique.
Find what: (.*\n?)(.*\n?)
Replace with:$1 → will return Chinese
Or if
Replace with: $2 → will return English

https://regex101.com/r/VIPS0s/1

Haji Rahmatullah
  • 390
  • 1
  • 2
  • 11