Regex remove english text from mixed chinese-english sentences using Notepad++ and Excel?

Question

I work with Notepad++ and Excel. I have data that contains text in English and Chinese.

The data structure is as follows:

<p> chinese text</p>
<p> english text</p>
<p> chinese text</p>
<p> english text</p>
<p> chinese text</p>
<p> english text</p>

How to delete all English text and also symbols between and ?

So just leave the Chinese text between and 

So the result is like this:

<p> chinese text</p>
<p> chinese text</p>
<p> chinese text</p>

I tried to delete English text by removing ascii characters using regex, but there is an English text that was missed.

See this pic: Please help me, thanks

Data Example: `
坐在马车上的叶天，通过窗户看向远处雄伟的天炎城，一时间不由苦笑。

Ye Tian, sitting in the carriage

至于原因，小糯米跟纪小小根本受不了天炎城那一百多度的高温，此时已经热的快要中暑了。

As for the reason, Xiaomi and Ji Xiaoxiao couldn't stand has a heatstroke.

小金吼倒是没事，趴在叶天脚下睡的深沉。

Xiao Jin’s squatting is fine, and he is sleeping deep at the foot of Ye Tian.

“少爷，我受不了啦！”小糯米眼见马车每前行一步，温度就会上升几度，连向叶天求救。

""Young master, I can't stand it!"" .
` — elevaku, Apr 21 '20 at 05:38

score 0 · Accepted Answer · answered Apr 21 '20 at 05:54

0

You should be able to do this using Notepad++:

replace [a-zA-Z"].*$ to empty string (regex replace mode)
replace \n\n to \n (extended replace mode)
replace | to empty string (regex replace mode)

answered Apr 21 '20 at 05:54

Patrick Artner

50,409
9
43
69

it works on paragraphs after the

tag has spaces. But, if the sentence with the

tag with the next sentence has no space, then it only works on the first sentence. – elevaku Apr 21 '20 at 06:23

score 0 · Answer 2 · answered Apr 21 '20 at 06:03

0

Try this: https://regex101.com/r/TGrW27/1

This regex will basically match and remove:

: Match the starting tag
(\w|"|'): Match any English letter or " or '
.+: Any number of times
<\/p>: And watch for closing tag

answered Apr 21 '20 at 06:03

adelriosantiago

7,762
7
38
71

it works on paragraphs after the

tag has spaces. But, if the sentence with the

tag with the next sentence has no space, then it only works on the first sentence. check https://regex101.com/r/TGrW27/2 – elevaku Apr 21 '20 at 06:24

score 0 · Answer 3 · answered Apr 21 '20 at 06:35

0

Most of the above solutions only work in the first paragraph if there is a paragraph model that extends 1 line. When I try, this doesn't work for paragraphs that extend one line

answered Apr 21 '20 at 06:35

elevaku

7
2

Haji Rahmatullah · Answer 4 · 2021-04-20T10:23:57.760

0

If your data is always as first line Chinese and second line English? Then you can solve this problem using below technique.
Find what: (.*\n?)(.*\n?)
Replace with:$1 → will return Chinese
Or if
Replace with: $2 → will return English

https://regex101.com/r/VIPS0s/1

edited Apr 20 '21 at 10:23

answered Apr 20 '21 at 08:33

Haji Rahmatullah

390
1
2
11

Regex remove english text from mixed chinese-english sentences using Notepad++ and Excel?

4 Answers4