2

What is the easiest way to find the start position of the nth word in a string that potentially has multiple spaces between words.

I can do this easily with character by character parsing but I think there may be a faster and easier way with some of the bash commands.

There can be multiple equal words and words in substrings.

The start of the 5th word in this:

' the cat ate  the  bird'

should result in 20 (1 based)

KiloOne
  • 312
  • 1
  • 6
  • 20
  • http://xyproblem.info/ – Karoly Horvath Dec 08 '15 at 21:00
  • Can you explain why this was an xyproblem. My subject title actually says it all and you have to assume that since the format of the words was not specified, then they could be anything including repeats and substrings. – KiloOne Dec 08 '15 at 21:27
  • 1
    @KiloOne It's X/Y because you're asking for an algorithm without the context of the problem the algorithm solves. – bishop Dec 08 '15 at 21:30
  • :) now if I only knew what that meant :) – KiloOne Dec 08 '15 at 21:33
  • It means "please explain to us which problem this will solve for you, because solving this problem is a waste of time if there are sane approaches to your other problem." – tripleee Dec 10 '15 at 12:38
  • A lot of what our modern society has become is a result of smart people asking dumb questions. Be careful in labeling things with the connotation that time is being wasted. – KiloOne Dec 10 '15 at 13:54

2 Answers2

4

Using awk is pretty quick:

$ awk '{ print index($0, $2); }' <<<'foo bar baz'
4

This gives the 1-based character index for the second word. Replace $2 with $1 for the first word, $3 for third, and so on or $NF for the last word. Be careful when the nth-word is a substring of one of the preceding words.

Update based on Karakfa's clever approach: If your nth-word is a substring of a preceding word, then you need to be more diligent:

$ cat t
foo bar baz
fobaro bar baz
bar bar baz

$ awk '{ print 1 == index($0, $2) ? 1 : index($0, " "$2)+1; }' < t
4
7
0

$ awk '{ print 1 == index($0, $5) ? 1 : index($0, " "$5)+1; }' <<<' the cat ate  the  bird'
20

Updated based on KiloOne's need for a function:

function position() {
    local n=${1:?For what column do you want position?}

    awk "{ print 1 == index(\$0, \$$n) ? 1 : index(\$0, \" \"\$$n)+1; }"
}

$ echo 'my cat ate your bird' | position 3
8 

Now available on github as a bashworks module.

bishop
  • 37,830
  • 11
  • 104
  • 139
  • try "foobar bar baz" as the input. – karakfa Dec 08 '15 at 21:00
  • And what happens *when the nth-word is a substring of one of the proceeding words*? – Alvaro Flaño Larrondo Dec 08 '15 at 21:00
  • @karakfa Yes, I am aware. That's why I wrote "Be careful when the nth-word is a substring of one of the proceeding words." – bishop Dec 08 '15 at 21:01
  • 1
    I think this will fix it `awk '{print index($0, " "$2)}'` since space is not part of any word. – karakfa Dec 08 '15 at 21:02
  • @AlvaroFlañoLarrondo Then the answer is unexpected. The OP did not provide enough detail for me to consider clarifying a better solution. – bishop Dec 08 '15 at 21:02
  • 1
    @karakfa That is extremely clever! I think you're right. – bishop Dec 08 '15 at 21:02
  • 1
    Except for the first word! But it's trivial anyway. – karakfa Dec 08 '15 at 21:03
  • 1
    Remove `-1` since it's pointing to space for 0 based indexing. – karakfa Dec 08 '15 at 21:04
  • Yes words can be repeated so even this does not work awk '{ print index($0, " "$4) }' <<<' the cat ate the bird' , – KiloOne Dec 08 '15 at 21:07
  • @KiloOne If words can be repeated in *columns*, how is a computer to know which one you meant *other than by the start position*? – bishop Dec 08 '15 at 21:11
  • I want the start of the nth group of characters – KiloOne Dec 08 '15 at 21:13
  • @KiloOne Well, give my latest edit a whirl using your latest example and if it doesn't fit the bill update your post with further requirements. – bishop Dec 08 '15 at 21:16
  • That looks like it works, now can you put that result into a variable with the input string as another variable for use in a script? I am new to all this redirecting. – KiloOne Dec 08 '15 at 21:20
  • @KiloOne Added a function, to simplify how the method is called. Enjoy! – bishop Dec 08 '15 at 21:25
  • bishop, should I ask for more dare I fear sanctions for not asking for it all in the beginning? :) – KiloOne Dec 08 '15 at 21:47
  • @KiloOne You can always ask more. Questions lead to questions. On that note, consider posting a new question instead of changing this one! – bishop Dec 08 '15 at 21:49
  • Done: http://stackoverflow.com/questions/34167473/bash-placing-a-string-into-a-variable-in-a-script-that-is-the-nth-word-to-the-e – KiloOne Dec 08 '15 at 22:48
1

awk to the rescue!

If this is an xy problem and you actually want to extract the n'th field after finding the position, you can try the following. For example for n=4.

$ echo "this is a   long    string  with     non-uniform    spacing"  | awk '{print $4}'

long

or

$ echo ... | tr -s ' ' '\t' | cut -f4

long
karakfa
  • 66,216
  • 7
  • 41
  • 56