2

I need to make this exercise about regexes and text manipulation in vim.

So I have this file about the most scoring soccer players in history, with 50 entries looking like this:

1 Cristiano Ronaldo Portugal 88 121 0.73 03 Manchester United Real Madrid

The whitespaces between the fields are tabs (\t)

The fields each respond to a differen category: etc... This last field contains one or more clubs the player has played in. (so not a fixed number of clubs)

The question: replace all tabs with a ';', except for the last field, where the clubs need to be seperated by a ','.

So I thought: I just replace all of them with a comma, and then I replace the first 7 commas with a semicolon. But how do you do that? Everything - from regex to vim commands - is allowed.

The first part is easy: :2,$s/\t/,/g But the second part, I can't seem to figure out.

Any help would be greatly appreciated.

Thanks, Zeno

zeno dhaene
  • 245
  • 3
  • 14
  • Does it have to be a regex solution, or would using some combination of other vim features (global commands, normal commands, macros, etc.) be okay? – DJMcMayhem Nov 01 '16 at 19:43
  • yes, commands are allowed, I just 'assumed' it would be done with regex, I will change the title accordingly – zeno dhaene Nov 01 '16 at 19:46

5 Answers5

2

This answer is similar to @Amadan's, but it makes use of the ability to provide an expression as the replace string to actually do the difficult bit of changing the first set of tabs to semicolons:

%s/\v(.{-}\t){7}/\=substitute(submatch('0'), '\t', ';', 'g')/|%s/\t/,/g

Broken down this is a set of three substitute commands. The first two are cobbled together with a sub-replace-expression:

%s/\v(.{-}\t){7}/\=substitute(submatch('0'), '\t', ';', 'g')/

What this does is find exactly seven occurrances ({7}) of any character followed by a tab, in a non-greedy way. ((.{-}\t)). Then we replace this entire match (submatch(0)) with the result of the substitute expression (\=substitute(...)). The substitute expression is simple by comparison as it just converts all tabs to semicolons.

The last substitute just changes any other tabs on the line to commas.

See :help sub-replace-expression

Randy Morris
  • 39,631
  • 8
  • 69
  • 76
  • This seems like it would work! Just one question: am I right that there should be an asterix after the dot? – zeno dhaene Nov 02 '16 at 11:13
  • No. The `*` would be a greedy match and will eat up everything until the last tab. `{-}` is the non-greedy version. In this case it doesn't really matter because of the `{7}`, but in general I find it best to use the non-greedy match unless you really want greedy. – Randy Morris Nov 02 '16 at 11:55
  • Oh okay thanks! Will accept this answer instead of mine since I think it's better scaled than my solution. – zeno dhaene Nov 02 '16 at 14:53
0

Here's one way you could do it:

:let @q=":s/\t/;\<cr>"
:2,$norm 7@q
:2,$s/\t/,/g

Explanation:

First, we define a macro 'q' that will replace one tab with a semicolon. Now, on any line we can simply run this macro n times to replace the first n tabs. To automatically do this to every line, we use the norm command:

:2,$norm 7@q

This is essentially the same thing as literally typing 7@q (e.g. "run macro 'q' seven times") on every line in the specified range. From there, we can simply replace every tab with a comma.

:2,$s/\t/,/g
DJMcMayhem
  • 7,285
  • 4
  • 41
  • 61
  • 1
    Also, just so you know, there is also a [dedicated vim site](http://vi.stackexchange.com) that you could post your questions on in the future. – DJMcMayhem Nov 01 '16 at 19:52
  • seems like a solid response, I will test it out immediately. What is the '\' used for? – zeno dhaene Nov 01 '16 at 20:04
  • The `"\"` is the way you describe a literal carriage return. Entering it that way is important so that the substitute command is actually ran. Otherwise, you'd end up with something like `s/\t/;:s/\t/;:s/\t/;:s/\t/;:s/\t/;:s/\t/;:s/\t/;` and vim waiting for you to hit enter. – DJMcMayhem Nov 01 '16 at 20:08
  • Okay, I ran the commands and everything seems correct except for one thing: beginning on line 2 (where the tabs should be replaced), only the first tab is replaced by a semicolon. On line 3, there are 2 semicolons and so on. When the 8th line gets reached, there is just the right amount of semicolons (7), and further on, the amount of semicolons stays at 7. What's going on there? – zeno dhaene Nov 01 '16 at 20:12
  • @zenodhaene 0_0 I really have no idea. It works perfectly fine for me. Maybe you're entering it wrong? The only other thing I could think is to upload the file so I could try it and see. – DJMcMayhem Nov 01 '16 at 20:15
  • This is the original document: http://pastebin.com/nv6b77Uh, and this is the modified one: http://pastebin.com/hk6p6iLY. I will re-run the commands too make sure I didn't make a mistake. – zeno dhaene Nov 01 '16 at 20:18
  • Nope, seems like I didn't make a mistake (btw the number should be 8, not 7, but this is trivial), seems like the error occurs on the 2 command, Rank 1 (line 2) doesn't get changed, which is odd since the tab after the rank has been replace with a semicolon. Does the 'let' command change anything? – zeno dhaene Nov 01 '16 at 20:23
0
:2,$s/\t\(.*\t\)\@=/;/g
:2,$s/\t/,
  • Change any tabs where there is a tab later to ;
  • Change any remaining tabs to ,

EDIT: Misunderstood. Here is a fixed version:

:2,$s/\(\(\t.*\)\{7}\)\@<=\t/,/g
:2,$s/\t/;/g
  • Change any tabs where there's seven tabs before it to ,
  • Change any remaining tabs to ;
Amadan
  • 191,408
  • 23
  • 240
  • 301
  • As I see your answer, the last tab gets replaced by a comma. However, not every line is the same, so a player can have one club: then it needs no comma, or a player can have 3 clubs, and then it needs two commas – zeno dhaene Nov 02 '16 at 09:10
0

My PatternsOnText plugin has (among others) a :SubstituteSelected command that allows to specify the match positions. With this, you can easily replace the first 8 tabs with semicolons, and then use a regular substitute to change the remaining tabs into commas:

:2,$SubstituteSelected/\t/;/g 1-8
:2,$s/\t/,/g
Ingo Karkat
  • 167,457
  • 16
  • 250
  • 324
  • yeah, I've seen your (or others) plugin, but I would rather accomplish it without plugins since the solution should be reproducable by the teacher. I found the answer though, will be answering myself. Thanks for the answer! – zeno dhaene Nov 02 '16 at 10:31
0

We solved the issue by just capturing the first 8 groups manually ([^\t]*\t)(...)(...) and then separate them with a semicolon (\1;\2;...;) then replacing the remaining tabs with comma's | 2,$s/\t/,/g

Thanks to everyone trying to help!

zeno dhaene
  • 245
  • 3
  • 14