0

input code:

code I have now:

for (int i4 = 0; i4 < ba_result_out.Length; i4 += c.i_key_size)
{
  k = BitConverter.ToInt64(spn_data.Slice(i4, c.i_key_size));
  if (Dictionary.md1.ContainsKey(k)) { 
    //some logics skipped
  }
}

code I'm trying to make: (based on: https://learn.microsoft.com/en-us/dotnet/standard/parallel-programming/how-to-speed-up-small-loop-bodies )

ParallelOptions po = new ParallelOptions { MaxDegreeOfParallelism = 2 };
var rp = Partitioner.Create(0, ba_result_out.Length / c.i_key_size, po.MaxDegreeOfParallelism);
Parallel.ForEach(rp, po, (range, loopState) =>
{
  for (int i4 = range.Item1; i4 < range.Item2; i++)
  {
  k = BitConverter.ToInt64(spn_data.Slice(i4, c.i_key_size));
  if(Dictionary.ContainsKey(k)){
    //some logics skipped
  }
});

task: make it Parallel.ForEach, not possible with span.

problem: compiler does not allow span in lambda

Is it possible to loop via multiple spans in parallel for each of them?

n.b. this is very hot code - billions of iterations - so allocation is not an option - need to stick to spans.

  • I suggest [edit]ing your question to show the code you've tried (with `Parallel.ForEach` and `Span`s, so that others can see how you've tried to combine them, rather than guessing. – Heretic Monkey Nov 29 '22 at 15:51
  • If you have specific indexes you should use `Parallel.For` not `ForEach` – Panagiotis Kanavos Nov 29 '22 at 15:51
  • "allocation is not an option", yet you use `.ToArray()`? What is the actual goal here? The given example does not do *anything*. If you want to convert all the values in the array you can just cast the span, it will be difficult to be much faster than that. – JonasH Nov 29 '22 at 15:52
  • The question is unclear. `BitConverter` works with byte arrays so there are no allocations. There are copies. That `ToArray()` *creates* a copy which defeats the use of spans. [ToInt64](https://learn.microsoft.com/en-us/dotnet/api/system.bitconverter.toint64?view=net-7.0#system-bitconverter-toint64(system-readonlyspan((system-byte)))) can work with `ReadOnlySpan` spans – Panagiotis Kanavos Nov 29 '22 at 15:54
  • @panagiotis-kanavos thank you for a valida suggestion! I removed .ToArray() and it works. Faster than before. – Yurii Palkovskii Nov 29 '22 at 16:09
  • @JonasH - thank you for a valid idea. Fixed that. I've updated the post. – Yurii Palkovskii Nov 29 '22 at 16:29
  • @HereticMonkey I've updated the post. I've tried to remove all unrelated code. – Yurii Palkovskii Nov 29 '22 at 16:30

1 Answers1

1

Thank you to those who participated!

I made it work as intended, using AsSpan() inside lambda function:

  1. I switched to artificial array of indexes instead of spans as a base for PFE
  2. i used 1 allocation for Span inside lambda (as whole indexes are / by number of cores in partitioner it makes only 4 allocations of span)
  3. this is an implementation of small-body-loop parallelization from MS (link my original post)
  4. it can be further improoved via passing the pointer to span thus avoiding allocation as mentioned here

I ended up with this one:

i_max_search_threads = 4;
int[] ia_base_idxs = Enumerable.Range(0, ba_result_out.Length).ToArray();
var rp = Partitioner.Create(0, ia_base_idxs.Length, ia_base_idxs.Length / i_max_search_threads);
Parallel.ForEach(rp, po, (range, loopState) =>
{
 Span<byte> spn_data = ba_result_out.AsSpan();
 for (int i4 = range.Item1; i4 < range.Item2; i4 += c.i_key_size)
 {
   k = BitConverter.ToInt64(spn_data.Slice(i4, c.i_key_size));
   if(Dictionary.ContainsKey(k)){
       //some logics skipped...
     }
   }
});