APL - How can I find the longest word in a string vector?

Question

I want to find the longest word in a string vector. Using APL I know that the shape function will return the length of a string e.g.

⍴ 'string' ⍝ returns 6

The reduce function allows me to map diadic functions along a vector but since shape is monadic this will not work. How can I map the shape function in this case? For example:

If the vector is defined as:

lst ← 'this is a string'

I want to do this:

⍴'this' ⍴'is' ⍴'a' ⍴'string'

The "longest word" can be 2 or 3 ... – RosLuP Jun 28 '19 at 06:59 — RosLuP, Jun 28 '19 at 06:59

score 3 · Accepted Answer · answered Apr 04 '19 at 11:41

3

The "typical" approach would be to treat it as a segmented (or: separated) string and prefix it with the separator (a blank) and pass it to a dfn for further analysis:

{}' ',lst

The fn then looks for the separator and uses it to build the vectors of words:

      {(⍵=' ')⊂⍵}' ',lst
┌─────┬───┬──┬───────┐
│ this│ is│ a│ string│
└─────┴───┴──┴───────┘

Let's remove the blanks:

      {1↓¨(⍵=' ')⊂⍵}' ',lst
┌────┬──┬─┬──────┐
│this│is│a│string│
└────┴──┴─┴──────┘

And then you "just" need to compute the length of each vector:

{1↓¨(⍵=' ')⊂⍵}' ',lst

This is a direct implementation of your request. However, if you're not interested in the substrings themselves but only the length of "non-blank segments", a more "APLy"-solution might be to work with booleans (usually most efficient):

      lst=' '
0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0

So the ones are the positions of the separators - where do they occur?

      ⍸lst=' '
5 8 10

But we need a trailing blank, too - otherwise we're missing the end of text:

      ⍸' '=lst,' '
5 8 10 17

So these (minus the positions of the preceeding blank) should give the length of the segments:

      {¯1+⍵-0,¯1↓⍵}⍸' '=lst,' '
4 2 1 6

This is still somewhat naive and can be expressed in more advanced way - I leave that as an "exercise for the reader" ;-)

answered Apr 04 '19 at 11:41

MBaas

7,248
6
44
61

1

Thank you - hadn't thought of using the boolean method you described. Elegant! – awyr_agored Apr 04 '19 at 11:54
Yes, booleans are so useful in APL - they can't be overestimated! – MBaas Apr 04 '19 at 11:57
2

Or just ≢¨(' '≠lst)⊆lst – Paul Mansour Apr 04 '19 at 12:58
Yes! But that's "advanced" ;-) – MBaas Apr 04 '19 at 13:21
Nice - So just checked how @Paul Mansour's example works; (i) create a nested list of individual words; (ii) split the sentence into words using the binary output of (' '≠lst) then apply the tally function to each word. Thanks! This works too: ⍴ ¨(' '≠lst)⊆lst – awyr_agored Apr 04 '19 at 14:15
Indeed. Important to note that tally always returns a simple scalar, whereas shape (rho) returns a vector. So tally applied with each returns a simple vector, while shape applied with each returns a nested vector. – Paul Mansour Apr 04 '19 at 14:18
@Paul Mansour Ah ok! I would get a domain error if I tried to use the vector output as a scalar – awyr_agored Apr 04 '19 at 14:19
1

`{¯1+⍵-0,¯1↓⍵}⍸` can be just `¯1-2-/⍸` – Adám Apr 04 '19 at 16:22

Adám · Answer 2 · 2019-04-05T04:43:59.567

While MBaas has already thoroughly answered, I thought it might be interesting to learn the idiomatic Dyalog "train" ≠⊆⊢ derived from Paul Mansour's comment. It forms a dyadic function which splits its right argument on occurrences of the left argument:

      Split ← ≠⊆⊢
      ' ' Split 'this is a string'
┌────┬──┬─┬──────┐
│this│is│a│string│
└────┴──┴─┴──────┘

You can extend this function train to do the whole job:

      SegmentLengths ← ≢¨Split
      ' ' SegmentLengths 'this is a string'
4 2 1 6

Or even combine the definitions in one go:

      SegmentLengths ← ≢¨≠⊆⊢
      ' ' SegmentLengths 'this is a string'
4 2 1 6

If you are used to the idiomatic expression ≠⊆⊢ then it may actually read clearer than any well-fitting name you can give for the function, so you might as well just use the expression in-line:

      ' ' (≢¨≠⊆⊢) 'this is a string'
4 2 1 6

thank you for taking the time to explain. APL is cool! !תודה רבה — awyr_agored, Apr 05 '19 at 00:23

score 0 · Answer 3 · answered Jun 28 '19 at 07:15

For how to find the longhest word in a string i would use, in NARS APL the function

f←{v/⍨k=⌈/k←≢¨v←(⍵≠' ')⊂⍵}

example to use

  f  'this is a string thesam'
string thesam

explenation

{v/⍨k=⌈/k←≢¨v←(⍵≠' ')⊂⍵}
            v←(⍵≠' ')⊂⍵  split the string where are the spaces and assign result to v
        k←≢¨v             to each element of v find the lenght, the result will be a vector
                          that has same lenght of v saved in k
      ⌈/k                 this find max in k
    k=                    and this for each element of k return 0 if it is not max, 1 if it is max
 v/⍨                      this return the element of v that are max

APL - How can I find the longest word in a string vector?

3 Answers3