Parallel version for DSP kernels

Question

I developed auto-parallelizer for compiler-generated serial code ( see www.dalsoft.com ) and looking for the ways to apply this technology ( any suggestions? ). One possibility is to create parallel code for DSP filters. As an example I took Normalized Lattice filter ( latnrm ):

for (i = 0; i < NPOINTS; i++)   
   {
    top = InpData[i];
    for (j = 1; j < ORDER; j++)
     {
      left = top;
      right = InternalState[j];
      InternalState[j] = bottom;
      top = Coefficients[j-1] * left - Coefficients[j] * right;
      bottom = Coefficients[j-1] * right + Coefficients[j] * left;
     }
    InternalState[ORDER] = bottom;
    InternalState[ORDER+1] = top;
    sum = 0.0;
    for (j = 0; j <  ORDER; j++)
     {
      sum += InternalState[j] * Coefficients[j+ORDER];
     }
    OutData[i] = sum;
   }

Is there parallel version for this filter?

Is there need for parallel version for this filter?

After analyzing the code I realized that it is a 2-point stencil, thus parallelization may be attempted. It will help to better understand the usage of this kind of filters.

What are the usual values for NPOINTS and ORDER?
The code depends on the input InpData, InternalState and Coefficients. May it be assumed that routine will be called for different data in InpData and InternalState but the same Coefficients?
What are the other DSP kernels that need to be parallelized?

Thank you,

David Livshin

www.dalsoft.com

Given +1 for documenting OpenMP setup-costs ~ 3k5 [cpuCLKs] in your benchmarks - are your auto-parallelisations fine-grained like in SISAL compiler, or your efforts mainly go for loop-unrolling strategies without deeper lexical-analyses? For criticism about naively formulated Amdahl's Law, resp. for setup/termination add-on costs + impacts of undivisible(atomic)processing enjoy this read >>> https://stackoverflow.com/revisions/18374629/3 — user3666197, Aug 13 '19 at 16:41
To get some insides about utilized process of auto parallelization see www.dalsoft.com/Autoparallelizer_for_a_multi_core_x86_architecture.pdf from the section "Utilized technology". Amdahl's law seems as hardly related in the context of auto parallelization. — David Livshin, Aug 14 '19 at 06:38

Parallel version for DSP kernels

0 Answers0