0

How can I use emeditor to find and extract regex strings but maintain the same lines +/- a delimiter?

The output I get when I try to extract displayed strings, extracts each matched string to a new line. But my goal is to extract these matches from each line (removing the values I dont want)

For example

Starting with:

dog cat food
prince dog food

I would like to end up with

dog food
prince food

Or

with a delimiter

dog, food
prince, food

But using Emeditor

  1. Cntr+F
  2. (\b\w+\b)$|^\w+ and then selecting Regular expressions and extract>display matched strings only

the ouput is

dog
food
prince
food

Can this be accomplished using EmEditor or through a macro?

MMsmithH
  • 323
  • 1
  • 8

3 Answers3

1

I hope I understand the task correctly, the first and third of three values should remain.

Solution approach according to your example: The result is output in a new document.

Replace Dialog

  • The search term is analogous to your attempt

    ^(\b\w+\b) \b\w+\b (\b\w+\b)$

  • Replace with:

    \1 \2
    (delimiter is space in this case, put comma or what you like between \1 and \2)

  • Extract (Button)

Please check if a setting in "Advanced" prevents the desired result, otherwise press reset. Please use the latest version of EmEditor. enter image description here

Result in the new document:

enter image description here

Solution approach 2: From three values the middle one is deleted. In the same dialog as above, click on "Replace All" instead of the Extract function. If you do not want to change the original document, please use a copy.

TM1
  • 28
  • 2
  • The answer solves my example case. However, if you try replacing (\b\w+\b)$|(^\w+) with \1\2 Emeditor adds a new line. When the case becomes more complex, say the string involves other characters, it becomes harder, because then you would have to find Regex to match everything in the string, instead of only the desired Regex. – MMsmithH May 30 '23 at 13:04
  • Another approach I am trying to avoid is using a match everything except because that Regex is much harder to create, for example it is easier to come up with match everything except (cat)|[^c]+(?:c(?!at)[^c]*)*|(?:c(?!at)[^c]*)+[^c]*) than it is to say match everything except an email. So I am looking for a scalable solution to use in other use cases, even though my answer is oversimplified – MMsmithH May 30 '23 at 13:08
  • >>>The answer solves my example case<<< [how-to-ask](https://stackoverflow.com/help/how-to-ask) I am aware, you can't put original data here. But if your sample data is kept much simpler than the possible real data, you can't expect the solution to solve cases that are much more complex. Sample data should be as close as possible to the real data. regex can do a lot, but it can't do mind reading or magic. Although with emeditor's capabilities it comes pretty close. – TM1 May 31 '23 at 05:40
  • >>>Emeditor adds a new line<<< When you use "Find - Extract", EmEditor put each Value in a separate line (I add the Result screenshot). If you use "Replace - Extract" (as in my first example), the Result is line by line, no additional lines. Similar to the filter variant of mr. Yutaka. – TM1 May 31 '23 at 05:54
  • From the first question it was only to be recognized that the problem is that with the output each value is written into an own line. This problem has been solved (The answer solves my example case.) – TM1 May 31 '23 at 06:02
  • If there are problems with regex regarding the search term, please use sample data that is closer to the real data. In cases where I can't find a solution with regex, I use my own scripts in EmEditor which specifically solves the cases. It depends also on the source data size, because integrated functions are usually always more performant than scripts. I don't know of any other editor on windows that comes close to EmEditor's feature set. – TM1 May 31 '23 at 06:11
  • Since the answer solved the example case, I upvoted. But the better answer, solving all cases was selected as the best answer. – MMsmithH Jun 01 '23 at 14:26
1

Use the Filter toolbar instead of the Find dialog.

  1. In the Filter toolbar, click the Use Regular Expressions button, and enter a regular expression, for instance, ^\w+|\w+$, in the Filter drop-down list box.

enter image description here

  1. Click the Extract All button in the Filter toolbar, then select Extract Options in the popup menu.

enter image description here

  1. In the Filter Extract Options dialog box, select Extract all matched strings, and enter \t or , as a Delimiter. Click OK.

  2. Click the Extract All button again in the Filter toolbar, then select Extract Matched Strings in the popup menu.

If you record this procedure to a macro, you will get a macro like this:

document.Filter("^\\w+|\\w+$",0,eeFindReplaceRegExp,0,0,0,0,0);
editor.ExecuteCommandByID(4084);  // Extract Matched Strings

If you need to run this macro against many files in a folder, please see: Emeditor: run a macro for all file inside a folder?

Yutaka
  • 1,761
  • 2
  • 5
  • 9
  • can this done via macro to multiple files? – MMsmithH May 31 '23 at 23:12
  • Can I reorder the extracted strings ? In the above example instead of 1. dog, food and 2. prince food, I get 1. food, dog and 2. food, prince . I was thinking maybe if I capture 2 groups with regex (^\w+)|(\w+$) then it could be output in the order I want \2 ,\1. – MMsmithH Jun 01 '23 at 19:48
  • Not with Filter, but you can add another step (Replace All). – Yutaka Jun 01 '23 at 21:11
0

EmEditor has a powerful feature that other text editors do not have, but it is rarely mentioned, which is the \J mode (using JavaScript function or methods in replacement expressions), which can compensate for the shortcomings of regular expressions in certain situations. For example, this question can be handled using the following expression.

Find:^.+$

Replace: \J "\0".replace(/ cat | dog /g,",")

˽cat˽ and ˽dog˽ are the keywords to be replaced and you can change them according to your requirements. After clicking Replace All button , wanted strings will be left in same line.

screenshot from another link

David.Cao
  • 11
  • 2