Ruby: getting character[n] from each element of an array of strings, strictly with array notation

Question

Suppose I have this:

x = %w(greater yellow bandicooot)

And I want to get a specific letter of each string as a string. Of course, I can do something like this (to get the first letter):

x.map { |w| w[0] }.join  # => 'gyb'

But I'd like to know whether or not there's a way to do it using just array notation. I've tried this:

x[0][0]..x[-1][0]

Which returns, in this case, the not-so-helpful "g".."b". I could also use array notation like this in this case:

x[0][0] + x[1][0] + x[2][0]

But I'm looking for a non-case-specific solution that doesn't require iteration.

Is there a way to do this strictly with array notation, or is it necessary to do some sort of iteration? And if you can't do it with array notation, is there a better way to do it than using map and join?

I guess you could use a regex, but I don't know if that's really a better way. — max pleaner, Mar 13 '19 at 20:49
@maxpleaner Yeah, I thought about that too, but I don't see a way to use a regex that doesn't involve iterating the array. Do you? — BobRodes, Mar 13 '19 at 20:53

score 3 · Answer 1 · answered Mar 13 '19 at 21:10

3

Here's a fancy regex way to do it, if you have the word combined in a single space-delimited string:

string = "greater yellow bandicooot"
string.gsub /([^ ])[^ ]* */, '\1'
# => "gyb"

Explanation of the regex:

([^ ]): match group - single nonspace char
[^ ]* *: optional sequence of nonspace chars, followed by any optional sequence of space chars.

As you can read about here: Ruby regex - gsub only captured group, when using gsub everything in the entire regex is replaced, regardless of if it's in a match group. So you need to use the special variable \1 in the gsub call (must be in a single quoted string, by the way) to refer to the first match group, that you want to use as the output.

answered Mar 13 '19 at 21:10

max pleaner

26,189
9
66
118

Mighty interesting. I'm getting a little better at regex all the time, and I knew you could use the `\1` in a block with `gsub`, but I didn't know you could use it this way. Very cool. I'll study it a bit and see if I can adapt it to get any specific set of contiguous character locations in each word as well, which is in the end what I'm looking for. – BobRodes Mar 13 '19 at 21:19
After your idea of using a regex on the string, I came up with this one: `string.split.scan(/^\w|(?<=\W)\w/).join`. I used `}W` because some of the words were hyphenated instead of spaced. – BobRodes Mar 14 '19 at 00:25
Cool im not really that good at regex, never got around to learning the lookaheads – max pleaner Mar 14 '19 at 01:19
I'm just getting started with it. That one is actually a lookbehind, from what I'm told. So, as I understand it, the `\w` following the captured string `(?<=\W)` is compared to the captured string. The captured string is asking whether the immediately preceding character matches `\W`, or a non-alphanumeric character. (I have since realized that this also matches apostrophes, which doesn't suit my needs, so I have changed it to `(?<=[ -])` since I only want to match spaces or hyphens.) Are you familiar with https://rubular.com, by the way? I've been learning a lot from tinkering there. – BobRodes Mar 14 '19 at 17:11
Yeah, rubular is nice. By the way you can feel free to accept your own answer on this one, so it doesn't appear unsolved. – max pleaner Mar 14 '19 at 18:01

score 2 · Answer 2 · answered Mar 13 '19 at 21:55

And if you can't do it with array notation, is there a better way to do it than using map and join

Well, the short (and wrong :)) answer is "this is impossible" - to get each nth character of each string in an array you obviously have to iterate (and yes, regexp is iteration too - most probably less performant than array iteration).

But let's imagine you have a real app where you should perform this operation very very often and the list of strings is huge (so iterations are painful). On the other hand, the list of strings is rarely changed and you almost never need the original strings back.

In this case, you could borrow the idea behind column storages and transform the original array into something like

transposed_x = ["gyb", "rea", "eln", "ald", "toi", "ewc", "r_o", "__o", "__o", "__t"]

where each nth element is just a concatenation of nth char of each original strings (here I replace "missing" character with _ for clarity). With this data model you can perform the original task 1) with just an array notation and 2) in O(1). As an obvious tradeoff, you will have to iterate for every other operation (fetching the original string back, adding/removing/updating etc)...

Very interesting. I'll keep it in mind for large datasets. Thanks for sharing. — BobRodes, Mar 14 '19 at 00:21

Cary Swoveland · Accepted Answer · 2019-03-27T18:37:17.557

2

I don't believe there is a way to do what you want, but you could do the following.

str = %w(greater yellow bandicooot).join(' ')
  #=> "greater yellow bandicooot"

str.gsub(/(?<=\S)./, '')
  #=> "gyb"

The regular expression matches any character that is preceded by a non-whitespace character; that is, it matches all characters other than the first character of the string and characters preceded by a whitespace character.

If one is given the string and there could be multiple spaces between words, one could write:

str.squeeze(' ').gsub(/(?<=\S)./, '')

edited Mar 27 '19 at 18:37

answered Mar 24 '19 at 17:45

Cary Swoveland

106,649
6
63
100

Interesting. Now, if you use lowercase `\s`, it matches the two characters preceded by spaces but not the first character. So why do neither `\s` nor `\S` match the first character? Either it is or isn't preceded by a whitespace character, one would think. Is there some special characteristic of the first character in a string from a regex perspective? – BobRodes Mar 25 '19 at 21:21
1

Suppose I wrote the positive lookbehind as `(?<=\p{Alpha})`. Then `"g"` in `"greater"` is not preceded by a letter, so it's not a match and therefore is not converted to an empty string. The remaining characters are all preceded by a spae or a letter. The characters that match the regex are those preceded by a letter. That's all spaces (since there is only one between words) and all letters other than the first of each word. I wrote `(?<=\S)` rather than `(?<=\p{Alpha})` so that all characters preceded by anything other than a whitespace character would be converted to an empty string. – Cary Swoveland Mar 25 '19 at 22:13
Oh, I see! You're matching every character of any type that is preceded by an alpha character. Therefore, characters preceded by a space or by nothing at all will not match. Perfect. Thanks for explaining. – BobRodes Mar 27 '19 at 17:27

score 1 · Answer 4 · answered Dec 29 '21 at 11:15

I don't think this is possible (using no iteration). Similar to the suggestion made by @Konstantin Strukov, the closest thing I can think of would be something like this:

array = %w(greater yellow badicoot)  #=>  ["greater", "yellow", "bandicoot"]
string = array.join(" ")  #=>  "greater yellow bandicoot"
array_2 = string.chars.slice_after(" ").to_a  #=>  [["g", "r", "e", "a", "t", "e", "r", " "], ["y", "e", "l", "l", "o", "w", " "], ["b", "a", "n", "d", "i", "c", "o", "o", "t"]]
array_3 = array_2[0].zip(*array_2[1..-1])  #=>  [["g", "y", "b"], ["r", "e", "a"], ["e", "l", "n"], ["a", "l", "d"], ["t", "o", "i"], ["e", "w", "c"], ["r", " ", "o"], [" ", nil, "o"]]
result = array_3[0].join  #=>  "gyb"

It doesn't use any blocks, but unless I'm mistaken, I believe both slice_after and zip are both iterating so I'm still breaking your rules.

Ruby: getting character[n] from each element of an array of strings, strictly with array notation

4 Answers4