1

As always, I'm probably missing something obvious here. I can't post the entire source code, because it's work-related but I have a templated Matrix class with a mulMM (multiply Matrix by Matrix) function and a mulMT function (multiply matrix by transpose matrix). That already exists in the code. I am trying to add a mulMT function that specializes for single precision Floating point values. I have a similar specialization, mulMM which shares the same first two function parameters. Here is some example code:

#ifdef TARGET_x86_SSE4

template <> 
Matrix<Float>&  Matrix<Float>::mulMM (const Matrix<Float>& mat1, 
                                      const Matrix<Float>& mat2);

template <> 
Matrix<Float>&  Matrix<Float>::mulMT (const Matrix<Float>& mat1, 
                                      const Matrix<Float>& mat2,
                                      const bool softmax);

#endif // no SSE4: use template generic

template <class BT> 
Matrix<BT>&  Matrix<BT>::mulMM (const Matrix<BT>& mat1, 
                                const Matrix<BT>& mat2) {
  // Function code
}

template <class BT> 
Matrix<BT>&  Matrix<BT>::mulMT (const Matrix<BT>& mat1, 
                                const Matrix<BT>& mat2,
                                const bool softmax) {
  // Function code
}

Then, in a separate file:

#ifdef TARGET_x86_SSE4

template <> 
Matrix<Float>&  Matrix<Float>::mulMM (const Matrix<Float>& mat1, 
                                      const Matrix<Float>& mat2) {
  // Function code
}

template <> 
Matrix<Float>&  Matrix<Float>::mulMT (const Matrix<Float>& mat1, 
                                      const Matrix<Float>& mat2, 
                                      const bool softmax) {
  // Function code
}

When I invoke the code, the mulMM correctly pulls in the values of the input matrices, but when I invoke mulMT, I wind up with random numbers for the dimensions of the matrix. Using the debugger, I can see that the matrix is properly defined when coming in, but as soon as I've stepped into the function, the values are all wrong. Here is some example code I'm using:

MatrixFloat mA(4, 5);
MatrixFloat mB(5, 4);
// Code to initialize mA and mB's values.

MatrixFloat mR = mR.mulMM(mA, mB)
// This works fine and correctly find the values for the product.

MatrixFloat mC(4, 4)
// Code to initialize mC
mR = mR.mulMT(mC, mC)
// This fails with a memory access error and stepping into the function reveals 
// that this seems to be because mC has dimensions of 84376 x 1 or thereabouts

I have tried making the two values into MT different. I've verified that, when my mulMT for Floats isn't present, the generic one works. I don't know why this is doing what it's doing.

Edit: In an effort to prove that the code inside the Float specialization was not the issue, I commented it out. Everything worked. I uncommented the code. Everything still works. I'm thinking that maybe Visual Studio just got confused and did not recompile something.

Further Edit: OK... this is very strange. When debugging, when I first step into mulMT, it drops me onto the line with const bool softmax) {. If I try F10, I get the exception. If I hit F11, it drops me into ChkStk.asm. If I exit out of that file, I'm able to access the inside of the function and everything is allocated correctly. In fact, as long as I don't recompile my test program, I can step through or run as many times as I like without error. However, if I make a change to my test program and recompile, the behavior comes back.

I am even more confused than I was at the start.

Edit: OK, after two days with no issues, it's come up again. I think that it is not, as I thought before, something involving the values changing. Instead, it's something happening before the parameters are actually instantiated. Thing is, I don't know why still, other than that this is the only function I have which drops me into stkchk.asm if I hit F11 after entering the function.

First-chance exception at 0x00789adc in UnitTest.exe: 0xC0000005: Access violation reading location 0xabababab. is the exception I get. Earlier in the function, I make an almost identical call, only varying which variables are used with the operator. Oddly enough, a try-catch block around the call does nothing to prevent the program from crashing.

Sean Duggan
  • 1,105
  • 2
  • 18
  • 48
  • Is it returning a local object of type Matrix? Im asking this because your are returning a reference for something. I suggest you to change this to a pointer return. Something like this: template<> Matrix* Matrix..... – Amadeus May 07 '13 at 17:23
  • There's still some code missing to get a glance what might be going on (e.g.: What is `MatrixFloat`? I'd guess a typedef for `Matrix`, but who knows ...). – πάντα ῥεῖ May 07 '13 at 17:23
  • Could you show the code that you skipped with the comments `//function code` ? There are chances that the error lies there... – JBL May 07 '13 at 17:27
  • @TomásBadan It does return a (*this) pointer at the end, pointing to the class that called the function. That is not something I can change because I'm working with existing code. – Sean Duggan May 07 '13 at 17:27
  • And no, I can't post the code inside because it's code used by my company. Yes, MatrixFloat is a typedef for Matrix. *wry grin* I'm trying to excerpt just what I need from the code. And... initially I was going to say that it was ridiculous for the code inside to mess it up before I even entered the function. Then I commented out all of the code in the Float specialization of mulMT and it worked. And I uncommented it and it still works. I'm even more confused now. – Sean Duggan May 07 '13 at 17:42
  • Ah. I found what made it work. If, when debugging and I've just entered the function, it's still on the `Matrix& Matrix::mulMT (const Matrix& mat1, const Matrix& mat2, const bool softmax) {` line. If I hit `F11` to step in the function), it drops into `chkstk.asm`. If I exit out of that bit of assembly code, I'm put inside the function and it works fine, even gives me the right answer. – Sean Duggan May 07 '13 at 17:54
  • @SeanDuggan I cant understand. whether memory access problem has gone now? – Muthu Ganapathy Nathan May 07 '13 at 18:12
  • `MatrixFloat mR = mR.mulMM(mA, mB)` this uses `mR` before fully constructing it, unless `mulMM()` is class-static. You are then initializing `mR` with the result of that call. I guess this is the culprit. – Ulrich Eckhardt May 07 '13 at 18:15
  • It also happens when I initialize mR explicitly first, or when I use our operator, which generates an empty Matrix class to handle the operation. As of now, the error has not returned, so I'm going to ascribe it to IDE weirdness for now. – Sean Duggan May 08 '13 at 13:51

1 Answers1

0

I figured out what was going wrong. The exception was indeed happening inside of the function. Visual Studio was optimizing away the function, so unless I had a break point inside of it, it would just step over all of it. If the function ran properly, it ran properly. If there was any exception inside, it failed. Still no idea why the try catch block didn't catch it, but I'm not sweating it.

Thank you everyone for your attempts to help. And thank you, @JBL for being right in the end.

Sean Duggan
  • 1,105
  • 2
  • 18
  • 48