4

I have the following vector Vec: ACGTTGCA and would like to divide it into a nested vector, in which on the i-ith positions there will be a subsegment of Vec of length 4, starting at the i-th position of Vec.

For example, Vec[(⍳¯3+⍴Vec)∘.+¯1+⍳4] returns:

ACGT
CGTT
GTTG
TTGC
TGCA

But the problem with the above output is that it is a character matrix, whereas I would like to get the following output:

┌──────────────────────────┐
│┌────┬────┬────┬────┬────┐│
││ACGT│CGTT│GTTG│TTGC│TGCA││
│└────┴────┴────┴────┴────┘│
└──────────────────────────┘

For the following string: vec←'Hy, only testing segmenting vec into pieces of 4' the correct result of what I'm looking for would be:

┌→────────────────────────────────────────┐
│ ┌→───┐ ┌→───┐ ┌→───┐ ┌→───┐             │
│ │Hy, │ │y, o│ │, on│ │ onl│ (and so on) │
│ └────┘ └────┘ └────┘ └────┘             │
└∊────────────────────────────────────────┘

Also, is there a way to convert such vector to a single vector, in which subsequent lines would contain 4 characters?

Example: for a foobartesting character vector the result would be:

foob
ooba
obar
bart
arte
rtes
test
esti
stin
ting
syntagma
  • 23,346
  • 16
  • 78
  • 134

3 Answers3

3

To return to your original question: you only need to add a leading "split" (↓) to turn your matrix result into the vector of vectors that you are (were) looking for. Note that although it may not be as elegant, the "classical" solution based on generating a matrix of indices may be much more efficient, because that particular windowed reduction isn't on the list of cases that most APL interpreters optimise.

In Dyalog APL v14.0/64 running on an Intel Core i5 @ 1.60Ghz:

x←'foobartesting'

(4 ,/ x) executes in about 9.3 microseconds

(↓4 {⍵[(0,⍳-⍺-⍴⍵)∘.+⍳⍺]} x) clocks in at around 2.3

As the vector length increases, the efficiency gap grows; by the time you reach an argument of length 10,000 the windowed reduction is almost 10x slower (7 vs 0.7 milliseconds).

In Dyalog APL, the efficiency of the "classical" approach is enhanced by the availability of 1-byte and 2-byte integer types; your mileage may vary if you are using other APL interpreters.

Morten Kromberg
  • 643
  • 5
  • 5
2

This is tested in GNU APL, but I don't think this should be any different in Dyalog. My solution is as simple as this:

      4 ,/ 'foobartesting'
 foob ooba obar bart arte rtes test esti stin ting
Elias Mårtenson
  • 3,820
  • 23
  • 32
  • Thanks, I knew there must be an easier (and faster) solution. – syntagma Jul 28 '14 at 17:27
  • BTW, is there a way to do the similar thing, i.e. create a single vector looking like that: `foob\nooba\nobar\n(...)\nting` (edited my question to show what I am asking for)? – syntagma Jul 28 '14 at 17:29
  • Just use the monadic ⊃. This will take a list of arrays and create a two-dimensional array from it. I.e, simply doing `⊃ 4 ,/ 'foobartesting'` should do this. I don't know if I should update the answer to cover this? – Elias Mårtenson Jul 29 '14 at 06:05
  • It would be great if you'd update your answer because I can't make it work. – syntagma Jul 29 '14 at 07:38
  • I remember seeing that Dyalog mixed up the ⊃ and ↑ symbols (or something like that). It was also affected by the ⎕ML setting. I don't use Dyalog so I can't test myself, but please try replacing ⊃ with ↑ or ↓. – Elias Mårtenson Jul 29 '14 at 07:49
  • Great, it works with ↑, i.e.: `↑4 ,/ 'foobartesting'`. Thanks again. – syntagma Jul 29 '14 at 09:46
0

I'm not sure I do understand your description correctly. But what I understood is, you have a vector:

vec←'Hy, only testing segmenting vec into pieces of 4'

Oh, besides, we need to assign the migration level for this execise ;-)

⎕ml←3

Modified answer after understanding question ;-) :

      display 4{⍺↑¨(0,⍳(⍴⍵)-⍺)↓¨⊂⍵}'ACGTTGCA'
┌→───────────────────────────────────┐
│ ┌→───┐ ┌→───┐ ┌→───┐ ┌→───┐ ┌→───┐ │
│ │ACGT│ │CGTT│ │GTTG│ │TTGC│ │TGCA│ │
│ └────┘ └────┘ └────┘ └────┘ └────┘ │
└∊───────────────────────────────────┘
MBaas
  • 7,248
  • 6
  • 44
  • 61