-1

I have a document containing a series of strings between hundreds of [] and I want to highlight the strings and copy the information into a spreadsheet.

I have attempted using the Find tool but cannot figure out the regex expression

The final goal of this would be to be able to copy the information in one go into a new file, or highlight it and copy into an excel spreadsheet.

Text file something like:

>X_343435353.3 words like foo bar [Wanted text]
TGATGATGCCATGCTAGCCATCGACTAGCGACTAGCATCGACTAGCATCAGCTACGACTAGCATCGACTACGA
>XP_543857836.3 other information [Text that I want]
TAGCATCGACTAGCTACTACCTGAGCGAGAAATTTTGGCTATCGACATCGACTATCGAGCACAGCTAGGAATT
>NP_3843875938.2 interesting words [Third desired text]
ATCGCATAGCGCGCTTAGAAGGCCTTAGAGGCATCATCTATCGAGCGACGATATCGCGAGGCAGCGCTATACC

The ouput I desire is as follows:

Wanted text
Text that I want
Third desired text

I am not sure if it is possible to do this in Notepad++ or if you need to use a cmd/shell tool to do it. I am using a Windows 10. The thought was that it may be possible to highlight all of the desired text with a regex that can then be copied elsewhere.

3 Answers3

0

To match just the text and not the brackets:

(?<=\[).*?(?=\])

Example:

Notepad++ search example with OP example text

To delete everything in a document and leave just the wanted text on each line:

  1. Set the cursor at the start of the document.
  2. Macro, Start recording.
  3. Ctrl-F (Find), .*?\[, Select regular expression and . matches newline.
  4. Click Find Next and close the dialog.
  5. Delete the highlighted text.
  6. Ctrl-F (Find), \], Select regular expression and . matches newline.
  7. Click Find Next and close the dialog.
  8. Hit Enter to delete the highlighted text.
  9. Macro, Stop recording.
  10. Macro, Run a macro multiple times, select Until end of file.
  11. Click Run.

Result:

Wanted text
Text that I want
Third desired text

You'll need to delete the last bit after the final match (if any) once the macro completes.

Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
0

Maybe this expression,

.*\[(.*?)\][\s\S]+?([\r\n]|$)

with a replacement of $1\n also might work.

The expression is explained on the top right panel of this demo if you wish to explore/simplify/modify it.

Emma
  • 27,428
  • 11
  • 44
  • 69
0

This one is working fine for me ....

Find what: >.*?\[(.*?)\]\n.*
Replace with: $1

Haji Rahmatullah
  • 390
  • 1
  • 2
  • 11