2

I have an XML file like this:

<fruit><apple>100</apple><banana>200</banana></fruit>
<fruit><apple>150</apple><banana>250</banana></fruit>

Now I want delete all the text in the file except the words in tag apple. That is, the file should contain:

100
150

How can I achive this?

Test
  • 61
  • 1
  • Please, let me know whether the approach described in my answer below works for you. – ib. Apr 02 '12 at 08:30

3 Answers3

5
:%s/.*apple>\(.*\)<\/apple.*/\1/

That should do what you need. Worked for me.

Basically just grabbing everything up to and including the tag, then backreferences everything between the apple begin and end tag, and matches to the rest of the line. Replaces it with the first backreference, which was the stuff between the apple tags.

Shawn D.
  • 7,895
  • 8
  • 35
  • 47
  • 1
    If more than one tag could appear on a line you might prefer to use a non-greedy match (using \{-\{) and replace all instances, adding newlines between individual results: :%s/.*apple>\(.\{-\}\)<\/apple.*/\1^M/g – Conspicuous Compiler Aug 07 '10 at 08:04
  • @ConspicuousCompiler: Beware: The command you propose does not solve the problem with multiple `` tags on a line! (Try it for `100200`, for example.) To really solve the issue, you can use the technique I describe in [my answer](http://stackoverflow.com/a/9507598/254635). – ib. Feb 29 '12 at 22:27
  • @ib: Just needed a little more non-greediness. This works fine: `:%s/.\{-\}apple>(.\{-\})<\/apple>.\{-\}/\1^M/g` But, yeah, definitely mailer code here. – Conspicuous Compiler Mar 01 '12 at 05:03
  • @ConspicuousCompiler: It seems, you do not test the commands you are suggesting. If you run that last command on `100200`, you will see it does not work that fine and leaves the closing `` tag untouched. – ib. Mar 01 '12 at 06:23
  • 1
    @ib: Hey, not sure if you're trying to intentional rag on me, but I ran the command in my local environment, and they worked fine. Don't know what your defaults are, but you're not winning any friends. – Conspicuous Compiler Mar 01 '12 at 07:17
  • @ConspicuousCompiler: Sorry, I do not mean to offend you. Let me explain the testing procedure I use; it is easily reproducible. Start Vim using the default (empty) configuration: `$ vim -u NONE`. In a new buffer, paste a single line, `100200`. – ib. Mar 02 '12 at 03:35
  • @ConspicuousCompiler: Then, issue the command `:%s/.\{-\}apple>(.\{-\})<\/apple>.\{-\}/\1^M/g` (typing `^M` as `Ctrl`+`V`, `Enter`, of course). It results in the `E486: Pattern not found` error. Run another version of the command (changed in attempt to make it work), `:%s/.\{-}apple>\(.\{-}\)<\/apple>.\{-}/\1\r/g`. It works without an error, however, leaving the buffer with three lines of text: `100`, `200`, and ``; which is definitely *not* the desired result. – ib. Mar 02 '12 at 03:36
  • @ConspicuousCompiler: So, even putting aside the issue with pattern syntax (I cannot see any configuration that would lead to capturing group parentheses be denoted as `(`, `)` at the same time as the `\{-}` atom would be written as `\{-}` and not `{-}`, as in "very magic" `\v` mode, for example), the command does not remove all of the text that should be cleared out. What behavior do you experience running the test procedure described in previous comments? – ib. Mar 02 '12 at 03:50
  • @ib: I appreciate you've got some zeal going on, but I'm uninterested in continuing this conversation. Go outside for a walk. Curl some grass between your toes. You might like it. – Conspicuous Compiler Mar 02 '12 at 17:02
  • To resolve our discussion with @ConspicuousCompiler, I encourage anyone who is reading these comments to test that command and report whether it works or not. – ib. Mar 03 '12 at 02:34
0

I personally use this:

%s;.*<apple>\(\d*\)</apple>.*;\1;

Since the text contain '/' which is the default seperator,and by using ';' as sep makes the code clearer. And I found that non-greedy match @Conspicuous Compiler mentioned should be

\{-}

instead of "{-}" in Vim. However, I after change Conspicuous' solution to

%s/.*apple>(.\{-\})<\/apple.*/\1^M/g

my Vim said it can't find the pattern.

Allan Ruin
  • 5,229
  • 7
  • 37
  • 42
  • The command that @ConspicuousCompiler propose does not work for me either (and in default Vim configuration, too): The pattern uses literal parentheses `(`/`)` instead of capturing group ones `\(`/`\)`. However, even after you change the parentheses, the command does not correctly extract the contents of `` tags if there are several such tags on a line. So, that command is not a complete solution. Unfortunately, @ConspicuousCompiler does not recognize the issue (see the comments to @ShawnD.'s answer). – ib. Mar 04 '12 at 01:28
  • By the way, could you please test the commands [I suggest](http://stackoverflow.com/a/9507598/254635) and report whether they works for you? – ib. Mar 05 '12 at 14:04
-2

In this case, one can use the general technique for collecting pattern matches explained in my answer to the question "How to extract regex matches using Vim".

In order to collect and store all of the matches in a list, run the Ex command

:let t=[] | %s/<apple>\(.\{-}\)<\/apple>\zs/\=add(t,submatch(1))[1:0]/g

The command purposely does not change the buffer's contents, only collects the matched text. To set the contents of the current buffer to the newline-separated list of matches, use the command

:0pu=t | +,$d_
Community
  • 1
  • 1
ib.
  • 27,830
  • 11
  • 80
  • 100
  • Note that this approach correctly handles when there are multiple `` tags on a line. – ib. Apr 02 '12 at 08:31
  • Please, explain reasons when down-voting: The command is working and well tested to accomplish the task in question. – ib. Apr 02 '12 at 08:31