0

I am working on a game using the Key operator to create simple parent tree nodes connected with children. Like (1 3 2 7 11 12) with 1 as a parent node and 3 2 7 11 12 children. The array has all the information via Key to create the nested array. Of course its extremely fast. But I actually need 2 or 3 more depth. I can create a different tree construction shown on the 'same' array - second image. This different encoding (1 2 1 1 2 3 1 3 3.....) allows arbitrarily nesting vector depth and works perfectly. - with just a simple array.

There could be enough information with the Key transformation on the array then more code to connect the children nodes - for needed depth. Are there any same or similar APL/Co-dfns for (1.) transforming the array into the tree (2.) - and back? I am new to APL and focusing on the rectangular. Tree wrangling is down the road. I need almost the same for Key speed due to very long arrays and their nested arrays.

   1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 
   1 2 1 1 2 3 1 3 3  3  1  1  2  7  8  9 16  4 

Using Key:
{⊂⍵}⌸1 2 1 1 2 3 1 3 3 3 1 1 2 7 8 9 16  4
(1 3 4 7 11 12) (2 5 13) (6 8 9 10) (14) (15) (16) (17) (18) 

Using maybe Key and something else....

     1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1.   1 2 1 1 2 3 1 3 3  3  1  1  2  7  8  9 16  4

2. (1 3 4 (7 14) 11 12) (2 5 13) (6 (8 15) (9 (16 17)) 10) (,18)




1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
1 2 2 2 2 2 1 7 8  3 10 11 10 10 10 15  9  
(different array for same tree encoding)

(1(7(8 (9 17)))) (2 3 4 5 6) (10(11 12) 13 14 (15 16))



({⊂⍵}⌸⍠ 2) 1 2 1 1 2 3 1 3 3 3 1 1 2 7 8 9 16 4

Perhaps using Variant on Key down the road?

enter image description here enter image description here enter image description here

creatural
  • 31
  • 4
  • It would be nice if you could replace your screenshots with copyable text. Simply put ``` on lines before and after the code to have it formatted as code. – Adám Jan 28 '23 at 17:38
  • source for the nice display functions {(⍎⍵⎕NS⍬).⎕CY ⍵}'dfns' ----- dfns.displays (1 3 4 (7 14) 11 12) (2 5 13) (6 (8 15) (9 (16 17)) 10) (,18) – creatural – creatural Feb 01 '23 at 17:23

1 Answers1

2

There are some ways to do this, but the best method will depend on what you want to do with the results. If you really do have very large arrays, then producing the "nested children" representation of arrays is going to be expensive no matter how you compute them, because the underlying representation is expensive (though, no more expensive than the same sort of representation in another language).

Section 3.2 of (Hsu 2019) discusses this in detail:

"A data parallel compiler hosted on the GPU". Hsu, Aaron W. https://scholarworks.iu.edu/dspace/handle/2022/24749

Generally speaking, if you intend to work with the data in some way, it is almost always faster and easier to work directly with the parent vector or depth vector representation instead of first converting to a record-type style representation.

One technique is to query the data in parent vector form first, to identify the relevant nodes over which you intend to work, and only then to extract the children nodes for that limited set using primitives like membership (∊) or where (⍸).

If you can describe the sort of operations you intend to perform over these nested representations, there might be a better algorithm that does not require the conversion.

If you do wish to simply create the full record-type representation, there is some conversion code in (Hsu 2019). You can also look at the P2D and D2P functions in the Co-dfns compiler:

https://github.com/Co-dfns/Co-dfns/blob/master/src/codfns/P2D.aplf

https://github.com/Co-dfns/Co-dfns/blob/master/src/codfns/D2P.aplf

These may give you some additional help in converting between the formats.

If you need to convert directly between the parent and record-type representation, you can use something akin to this:

kids←{0=≢k←⍸p=⍵:⍵ ⋄ ⍵,∇¨k~⍵}¨

And apply it to the root nodes of your tree like this:

kids ⍸p=⍳≢p

where p is your parent vector.

There is also the child vector representation:

      p
0 1 0 0 1 2 0 2 2 2 0 0 1 6 7 8 15 3
      ⊢k←(p=⍳≢p)↓¨k⊣k[p],←⍳≢p⊣k←(≢p)⍴⊂⍬
 2 3 6 10 11  4 12  5 7 8 9  17      13  14  15              16     
      0⊃k
2 3 6 10 11
      k[0⊃k]
 5 7 8 9  17  13     

I hope this helps!

arcfide
  • 21
  • 3
  • Thanks for the information. Incrementally storing the parent vector (1 2 1 1 2 3 1 3 3 3 1 1 2 7 8 9 16 4.....) in native or component files and then rapid restore to the tree periodically is the main use. I could just use the parent vector via a search in ram but the tree would be much faster - in use. The parent vector and the tree are created simultaneously in ram for their utility. Its very useful to file the parent vector compared to a large tree in say a component file. Adding large trees incrementally in component files are too slow. – creatural Feb 12 '23 at 18:21
  • Can you clarify what's slow for you? I don't really understand what's going wrong on your end performance-wise. When you say "restore to the tree," do you mean restoring to the nested representation? Ideally, you should never have to return to that representation for anything. You say that the nested representation is *faster*? That shouldn't be the case, nor should it be more convenient for most problems. I'd say that I would need more information to fully understand what's proving to be inadequate about just using the parent vector all the time. – arcfide Feb 14 '23 at 14:47
  • The parent vector is filed in a native file or perhaps a component file -' incrementally'. Each row in a column array. Doing this the time is tiny ( say 15 microseconds) compared to the time to file that very large tree (say 1-5 seconds.) The only reason to restore the parent vector to the tree is if the system crashes, changing memory, doing larger changes like a switch to a server. That is the key. There could be a number of trees so the time to do a restore from file to ram is important. Just a guess - one million integers in a parent vector from file to ram to the tree - 50 milliseconds. – creatural Feb 14 '23 at 21:00
  • Dfns has a red/black tree which is useful for a game leaderboard. Its pretty fast as a Dfns and so using Co-Dfns would be quite a bit faster. For a leaderboard tree to restore from file that's fine for minutes. I will try out Co-Dfns for the other parent vector to tree use. – creatural Feb 14 '23 at 23:02
  • I'm sorry, I still don't understand what the issue is here. You say, "restore to the tree," but I still don't understand what that means. The "tree" is the parent vector. When you say "tree" do you mean the nested representation of a tree? I also don't understand why system crashes, memory, or the like has anything to do with things. When you want to rehydrate your system with the tree from disk, you just load the parent vector into memory, and you're done. I don't understand why you want to convert away from the parent vector to some other representation. – arcfide Feb 15 '23 at 23:39
  • ({⊂⍵}⌸⍠ 2) 1 2 1 1 2 3 1 3 3 3 1 1 2 7 8 9 16 4 (parent vector) - if there was a variant to Key like this then perhaps this tree results - (1 3 4 (7 14) 11 12) (2 5 13) (6 (8 15) (9 (16 17)) 10) (,18) or a Dfns. I don't think large trees can be filed "quickly" in native or component files. To get fast persistence in a file its easy to file the parent vector bit by bit, row by row say 5 microseconds every integer. After the game ensues the tree creates in the ram "and" at the same time the parent vector is filed - simultaneously. If memory crashes and it will - restore the tree via the file V. – creatural Feb 17 '23 at 06:01
  • I'm sorry, but the process you're working will still isn't clear to me. I understand now that you're calling the nested representation the "tree" (but both the nested representation and the parent vector are both the same "tree"), but I don't understand the operations you're trying to do, and why that is slow. Again, nothing in your statement above tells me why you're using a nested representation at all. The fastest way to get the tree is to use the parent vector directly and never use the nested representation. – arcfide Feb 18 '23 at 19:25
  • Here is an array -1 2 1 1 2 3 1 3 3 3 1 1 2 7 8 9 16 4 - using 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 to create the tree - via a decoding of the array and ⍳18 to the tree. (1 3 4 (7 14) 11 12) (2 5 13) (6 (8 15) (9 (16 17)) 10) (,18) the tree (nested array) parents are 1 2 6 18. Is there a quick way to insert a large tree - y←1e6⍴(1 3 4 (7 14) 11 12) (2 5 13) (6 (8 15) (9 (16 17)) 10) (,18). I need to file each additional item in ram of this very large tree. What is the usual time/method to do that? 1e6 tree takes too much time to be filed. The array incrementally is better? – creatural Feb 18 '23 at 21:01
  • The game is created in ram by the tree (1 3 4 (7 14) 11 12) (2 5 13) (6 (8 15) (9 (16 17)) 10) (,18). As the game gets much larger (1e6) each 'new' part of that tree needs to filed. Each parent or child needs to be filed. Is there a better way to file this 'array representation' of the tree. I could erase a large component file tree then store it again say every 10 minutes but each child, parent in this tree needs to be filed - quickly. No lengthy snapshots. What is the best way to file large trees (1e6) in less than say 40 microseconds? Best file the simple array rep. incrementally? – creatural Feb 18 '23 at 21:20
  • I think the confusion here is that you're using terms here which may or may not have technical meanings, without context, which makes it hard for me to guess at what you are actually doing. Specifically, I don't know what you mean by "insert" and what you mean by "file"? I can't tell whether you mean adding a new tree to an existing forest structure is slow in ram, or whether simply writing a new tree in a forest to disk is slow. Certainly, doing 1e6⍴(nested_rep_of_tree) is going to be *much* more memory intensive to work with than the parent vector representation of a tree in memory. – arcfide Feb 20 '23 at 02:27
  • The memory costs alone of using a nested representation are much higher than using the pointer vector representations. Are you just saying that adding a new tree to a forest is slow? What's wrong with using append-only strategies? Tail catenation is fast in Dyalog APL, for instance. See sec. 5.2.3 in the Co-dfns thesis for a generic discussion of node mutation (including addition) and sec. 3.5 for a canonical pass that demonstrates the use of catenation for fast node addition. All of these techniques scale to file system serialization as well as in memory operations. – arcfide Feb 20 '23 at 02:34
  • Finally, the whole point of this discussion was that you wanted to go *away* from a parent vector and *back* into a nested representation. My original question is why you would even start with the nested representation in the first place and why you need to convert back to the nested representation in the second. My point was that you are probably (though not guaranteed) better off by simply staying with the parent vector representation rather than *ever* using the nested representation. I really need clarification of the context and the reasons you're doing what you're doing. – arcfide Feb 20 '23 at 02:37
  • Restating things again, if you have a tree in parent vector form (1 2 1 1 2 3 1 3 3 3 1 1 2 7 8 9 16 4), then you just *keep* it that way all the time. If you want to write it to disk, just write it to disk. If you add more to it, then just add more too it and write that to disk. If you want to use the tree, then just use the tree as is in the parent vector form. There is zero conversion overhead in such a model, and chances are the memory requirements will be miniscule as well (c.f. Thesis experimental data). – arcfide Feb 20 '23 at 02:48
  • For this game in ram a 1e6 nested array/tree is very fast since I just locate a parent node. All of the children are right there for access. The array however is extremely fast to insert each integer into a native file. The native file is already filled with say ¯1 to say 1e6. I just insert the next array integer at the next location in the file. The tree and the array are created at the same time in ram. The tree has to be filed each new component. Since the array is also created its much easier to store in file - piece by piece. Its 100% the same as the tree for storage. – creatural Feb 21 '23 at 00:59
  • After a crash or new memory, all of the array in file needs to be 'rehydrated' into the tree for use in ram. This array (1 2 1 1 2 3 1 3 3 3 1 1 2 7 8 9 16 4) is much slower to use in ram since a child could be 1e6 distant to a parent. For the tree in ram each child is in the same parent vector. (1 3 4 (7 14) 11 12) (2 5 13) (6 (8 15) (9 (16 17)) 10) (,18). This array and the tree/nested array are perfect. 'Key' is extremely fast to restore a simple array into the tree. But I also need to add one more level depth. That's why I created this encoded array for the tree. Is there a better array? – creatural Feb 21 '23 at 01:12
  • Be precise in your language. The fact that this is for a game is irrelevant. What matters is the shape of your data, the degree of the tree, the depth of the tree, the access characteristics of the tree, what operations you're performing on the tree, and so forth. It's not even clear to me whether you're dealing with a forest or just a single tree of high cardinality. I've also already given code that gives you the nested representation from a parent vector, and you can a combination of Each, Power Operator, and Key to do the same thing if you want. If you want more, be precise. – arcfide Feb 22 '23 at 03:14
  • Also, remember that if you *really* need to traverse top down, it's trivial to simply have a child vector containing a vector of child vectors for each node. There's no reason that cannot be stored incrementally with good speed either. And of course, there's the question of whether the branching factor is known for each node type. If that's the case, then the storage and restoration can be even easier. – arcfide Feb 22 '23 at 03:17
  • Do you have some actual benchmarks that demonstrate what you're doing to test the performance of various options? – arcfide Feb 22 '23 at 03:21
  • I added a code example for the child representation to the answer above. – arcfide Feb 22 '23 at 03:47
  • I will use array "Key" to tree. y←1e6⍴ 1 2 1 1 2 3 1 3 3 3 1 1 2 7 8 9 16 4 - cmpx '{⊂⍵}⌸y' - 3.2E¯3 - AMD Ryzen 7 5700G. Later I will do more benchmarking and testing code needed for adding nesting depth. This current array/Key/tree is perfect for prototyping. Thanks for the information. – creatural Feb 23 '23 at 06:45