1

Long story short, i have to solve 20..200 block-tridiagonal linear systems during an iterational process. Size of systems is 50..100 blocks, 50..100 x 50..100 each. I will write down here my thoughts on it, and i ask you to share your opinion on my thoughts, as it is possible that i am mistaken in one regard or another.

To solve those equations, i use a matrix version of Thomas algorithm. It's exactly like scalar one, except instead of scalar coefficients in equations i have matrices (i.e. instead of "a_i x_{i-1} + b_i x_i + c_i x_{i+1} = f_i" i have "A_i X_{i-1} + B_i X_i + C_i X_{i+1} = F_i", where A_i, B_i, C_i - matrices; F_i and X_i are vectors.

Asymptotic complexity of such algorithm is O(N*M^3), where N is the size of overall matrix in blocks, and M is the size of each block.

Right now my bottleneck is inversion operation. Deep inside nested loops i have to calculate /a lot/ of inversions that look like "(c_i - a_i * alpha_i)^-1", where alpha_i is a dense MxM matrix. I am doing it using Gauss-Jordan algorithm, using additional memory (which i will have to use anyway later in the program) and O(M^3) operations.

Trying to find info on how to optimize inversion operation, i've found only threads about solving AX=B systems 'canonically', i.e. X=A^-1 B, with suggestions to use LU factorization instead of it. Sadly, as my inversion is a part of Thomas algorithm, if i resort to LU factorization, i will have to do it for a M*NxM*N matrix, which wil rise the complexity of solving the linear system by extra N^2 to O(N^3*M^3). That's slowing down by a factor of 2500..10000, which is quite bad.

Approximate or iterative inversions are out of scope, too, as slightest residual with exact inversion will cumulate very fast and cause global iterational process to explode.

I do calculations in parallel with Parallel.For(), solving each of the 20..200 systems separately.

Right now, to solve 20 such systems with N,M=50 on average takes 872ms (i7-3630QM, 2.4Ghz, 4cores (8 with hyper threading)).

And finally, here come the questions.

  1. Am i correct on what i wrote here? Is there an algorithm to significantly speed up calculations over what they are now?

  2. Inside of number-grinder part of my program i use only For loops (most of them are with constant boundaries, the exception being one of the loops inside of inversion algorithm) double arithmetic (+,-,*,/) and standard arrays ([], [,], [,,]). Will there be any speed-up if i rewrite this part as unsafe? Or as a library in C?

  3. How much is C# overhead on such tasks (double arrays grinding)? Are C compilers better at optimization of such simple code than C# 'compiler'?

  4. What should i look at when optimizing numbergrinder in C#? Is it suited for such task at all?

rene
  • 41,474
  • 78
  • 114
  • 152
Dantragof
  • 155
  • 6
  • 1
    I have removed the tag from your title. You can read [here](http://meta.stackexchange.com/questions/19190/should-questions-include-tags-in-their-titles) why they are not needed in the title. – rene Mar 30 '14 at 13:45

0 Answers0