4

I am passing a struct from C# to C++.

C# code:

[StructLayout(LayoutKind.Sequential, Pack = 8)]
public struct Data
{
[MarshalAs(UnmanagedType.U4)]
public int number;

[MarshalAs(UnmanagedType.ByValArray, SizeConst = 5)]
public int[] array;

[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 512)]
public string buffer;
}

C++ code:

struct Data
{
public:
    int number;
    int array[5];
    char buffer[512];
    //char *buffer;
};

The above method works fine. But Instead if I use pointers to handle data in C++ I am getting error as:

Unhandled Exception: System.AccessViolationException: Attempted to read or write protected memory

struct Data
{
public:
    int number;
    int *array;
    char *buffer;
};

Why cant I handle with pointers here? Is handling this case via pointers advantageous?

dragosht
  • 3,237
  • 2
  • 23
  • 32
Joga
  • 235
  • 3
  • 11
  • You cannot change your C++ declaration without also changing the C# declaration. After which you very quickly will find out that the int[] is not going to fly. Structures with pointers are a pretty nasty memory management problem, it is never very clear who is responsible for releasing the memory again. You'll have to take responsibility yourself and use IntPtr. And fret over whether or not the C++ code is going to deep-copy the array and the string, if it doesn't then you have the next problem of keeping those pointers valid. – Hans Passant Dec 21 '15 at 13:23

2 Answers2

1

The first struct works because it allocates array in the struct. The second is problematic because it only allocates an int pointer and char pointer (which is sizeof(void*) depends on your platform) in the struct, and not an int array. If you insist on using pointers you have to allocate and deallocate the memory on your own (i.e. new and delete[]).

Mr. Anderson
  • 1,609
  • 1
  • 13
  • 24
  • So, should I leave my code as such, or modify them to work using pointers? Which is more safer? And moreover, any advantages of shifting to pointers? – Joga Dec 22 '15 at 12:40
1

The problem is how your data represented in memory.

Let's assume you have an instance of c# structure that marshals to unmanaged code or even file.

[StructLayout(LayoutKind.Sequential, Pack = 8)]
public struct Data
{
[MarshalAs(UnmanagedType.U4)]
public int number = 5;

[MarshalAs(UnmanagedType.ByValArray, SizeConst = 5)]
public int[] array = {0, 1, 2, 3, 4};

[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 512)]

public string buffer = "Happy new Year";
}

According to this, your memory layout will be like this (in hex-like view):

05 00 00 00 00 00 00 00
01 00 00 00 02 00 00 00
03 00 00 00 04 00 00 00
00 48 00 61 00 70 00 70 
00 79 00 20 00 6E 00 65 
00 77 00 20 00 59 00 65 
00 61 00 72

Here we have first four bytes "05 00 00 00", which means number "5" in memory for your "number" variable. (Notice that these bytes in reversed order because Intel architecture is LittleEndian, see Endiannes for details)

Then we have next five integers as "00 00 00 00" = 0, "01 00 00 00" = 1, "02 00 00 00" = 2, "03 00 00 00" = 3, "04 00 00 00" = 4 for array named "array".

And the string "buffer" represents like this:

"00 48" = H
"00 61" = a
"00 70" = p
"00 70" = p
"00 79" = y
"00 20" = <space>
"00 6E" = n
"00 65" = e
"00 77" = w
"00 20" = <space>
"00 59" = Y
"00 65" = e
"00 61" = a
"00 72" = r

There is some trick that .NET always use Unicode to store it's string variables. Every Unicode character has it's two-byte representation.

Now, for this C++ struct

struct Data
{
public:
    int number;
    int array[5];
    char buffer[512];
    //char *buffer;
};

sizeof(int) is 4. So the content of memory for variable "number" = "05 00 00 00" which is number five. array[0],array1,array[2],array[3],array[4] lay out on memory blocks "00 00 00 00" = 0, "01 00 00 00" = 1, "02 00 00 00" = 2, "03 00 00 00" = 3, "04 00 00 00" = 4. Everything else remains to buffer[512] variable. But in c++, sizeof(char) == 1. The char data type usually used to represent old ASCII style text with a single byte encoding. You should use wchar_t instead which is perfectly fits for Unicode encodings.

Now let's take a look at

struct Data
{
public:
    int number;
    int *array;
    char *buffer;
};

This structure will be projected on the same memory layout as described above. If you're running under 32-bit environment (win32) the content of "array" pointer will be "00 00 00 00" (4 bytes for pointer) and "buffer" pointer will be "01 00 00 00".

If you're running under 64-bit environment (win64) the content of "array" pointer will be "00 00 00 00 01 00 00 00" (8 bytes for pointer) and buffer pointer will be "02 00 00 00 03 00 00 00".

These are some kind of invalid pointers which point who knows where. That's why you get Access Violation when you try to dereference them.

  • "Every Unicode character has its two-byte representation": That's not possible; there are too many. UTF-16 encodes Unicode codepoints in one or two 16-bit code units. – Tom Blodget Dec 22 '15 at 04:56
  • So, should I leave my code as such, or modify them to work using pointers? Which is more safer? And moreover, any advantages of shifting to pointers? – Joga Dec 22 '15 at 12:39
  • I would leave it as it is. – Anton Vorobyev Jan 21 '16 at 14:22