3

I'm populating a sparse array in Chapel with a loop that is reading over a CSV.

I'm wondering what the best pattern is.

var dnsDom = {1..n_dims, 1..n_dims};
var spsDom: sparse subdomain(dnsDom);
for line in file_reader.lines() {
   var i = line[1]:int;
   var j = line[2]:int;
   spsDom += (i,j);
}

Is this an efficient way of doing it?
Should I create a temporary array of tuples and append spsDom every ( say ) 10,000 rows?

Thanks!

user3666197
  • 1
  • 6
  • 50
  • 92
Brian Dolan
  • 3,086
  • 2
  • 24
  • 35
  • As has been asked in >>> https://stackoverflow.com/q/45172614 would you mind to provide a few details? If a quantitative measure of efficiency is aTimeDOMAIN value, have you measured your baseline implementation's `Timer.start(); ... ;Timer.stop()` to quote such baseline to compare against? If a quantitative measure is a memory footprint or some other criteria, would you kindly state it, so as others are able to share your expectations for the metrics of such efficiency? Thanks. ( Would be great if you could still post both the **`repr( I ) + repr( V )`** for your previous question. Thanks. ) – user3666197 Jul 24 '17 at 15:58
  • 1
    Not ignoring, I have been working directly with the Chapel team to get some of these things sorted. I intend to update w/ better answers soon – Brian Dolan Jul 24 '17 at 21:44
  • Great step, Brian. Situations in a combined [PTIME,PSPACE] or [EXPTIME,EXPSPACE] double-trouble corners, close to the physical boundaries of the complexity Zoo are always challenging. Each tradeoff in one dimension is obviously very expensive, if not infeasible at all, in the other of these two principal Turing SEQ-processing dimensions. Thanks for a note, will wait for updates. – user3666197 Jul 25 '17 at 08:41
  • Any progress in just putting a mechanical update of this post wrt https://stackoverflow.com/q/45172614? It's as easy as typing `repr( I ); repr( V )` + a copy/paste - this all ought be ok in a few seconds time, doesn't it? Thanks for a kind re-consideration, Brian, to make the post context clear & professional. – user3666197 Aug 05 '17 at 17:54
  • Would you mind to kindly provide the asked details about the sparse-matrices -- as documented and reminded above? Thank you to do so. – user3666197 Aug 09 '17 at 17:24

1 Answers1

3

The way you show in the snippet will expand the internal arrays of the sparse domain at every += operation. As you suggested; somehow buffering the read indices, then adding them in bulk will definitely perform better due to several optimizations for adding an array of indices.

You can similarly do a += where the right-hand side is an array:

spsDom += arrayOfIndices;

This overload of += operator on sparse domains is actually calling the main bulk addition method bulkAdd. The method itself has several flags that may help you gain even more performance in some cases. Note that += overload calls the bulkAdd method in the "safest" manner possible. i.e. the array of indices can be in random order, can include duplicates etc. If you have arrays (in your cases indices you read from the file) satisfy some requirements (Are they ordered? Are there duplicates? Do you need to preserve the input array?), you can use bulkAdd directly and pass several optimization flags.

See http://chapel.cray.com/docs/latest/builtins/internal/ChapelArray.html#ChapelArray.bulkAdd for the documentation of bulkAdd.

Edit: A snippet building on top of the one in question:

var dnsDom = {1..n_dims, 1..n_dims};
var spsDom: sparse subdomain(dnsDom);

//create an index buffer
config const indexBufferSize = 100;
var indexBufferDom: {0..#indexBufferSize};
var indexBuffer: [indexBufferDom] 2*int;

var count = 0;
for line in file_reader.lines() {

  indexBuffer[count] = (line[1]:int, line[2]:int);
  count += 1;

  // bulk add indices if the buffer is full
  if count == indexBufferSize {
    spsDom.bulkAdd(indexBuffer, dataSorted=true,
                                preserveInds=false,
                                isUnique=true);
    count = 0;
  }
}

// dump the final buffer that is (most likely) partially filled
spsDom.bulkAdd(indexBuffer[0..#count],  dataSorted=true,
                                        preserveInds=false,
                                        isUnique=true);

I haven't tested it but I think this should capture the basic idea.The flags passed to the bulkAdd should result in the best performance. Of course, this depends on the input buffer being sorted and not having any duplicates. Also, note that the initial bulkAdd will be much faster compared to consecutive ones. And they will probably get slower as the method needs to sift through the existing indices and shift them if necessary. So a larger buffer can deliver better performance.