2

I'm creating a COM dll that can be used from PHP to read a memory mapped file whose size I already know, while I have no problem reading the file, I can't return it correctly as a BSTR. When I use the dll it only returns the characters before a null character (3 characters in this case), I know files can contain multiple null characters, which is why I scpecified the size in the MultiByteToWideChar function, yet it still doesn't work.

STDMETHODIMP CMemReaderImpl::ReadFile(BSTR* filepath, BSTR* Ofile)
{

    if (*filepath == nullptr) {
        *Ofile = _com_util::ConvertStringToBSTR("err");
    }

    std::wstring wpath(*filepath, SysStringLen(*filepath));

    LPCWSTR lpath = wpath.c_str();

    HANDLE hFileMap;
    PCHAR lpBuffer = NULL;

    hFileMap = OpenFileMapping(
        FILE_MAP_ALL_ACCESS,
        FALSE,
        lpath
    );

    if (hFileMap == NULL) {
        char* err = "ERROR";
        *Ofile = _com_util::ConvertStringToBSTR(err);
    }

    lpBuffer = (PCHAR)MapViewOfFile(
        hFileMap,
        FILE_MAP_ALL_ACCESS,
        0,
        0,
        BUFF_SIZE
    );

    if (lpBuffer == NULL) {
        char* err = "ERROR";
        *Ofile = _com_util::ConvertStringToBSTR(err);
    }

    //where the magic happens

    int wslen = MultiByteToWideChar(CP_ACP, 0, lpBuffer, 1000, 0, 0);
    BSTR bstr = SysAllocStringLen(0, wslen);
    MultiByteToWideChar(CP_ACP, 0, lpBuffer, 1000, bstr, wslen);

    *Ofile = bstr;
    UnmapViewOfFile(lpBuffer);

    CloseHandle(hFileMap);

    return S_OK;
}

I really wish to return the entire file as a BSTR* so it can be manipulated by another php program, but so far nothing seems to work.

the php code:

<?php
    $obj = new COM("MemReader.MemReader");
    $result = $obj->ReadFile("Local\\imagen3.file");
    echo $result; //reads first 3 characters fine
    echo $result[4]; //error nothing here
?>
Caesar
  • 43
  • 7
  • I'm more curious how the OP is determining only characters to the first null are returned (i.e. are they using a function that prints terminated strings). Here's a crazy thought. How about debugging the code to check what wslen actually is. – WhozCraig May 24 '19 at 15:45
  • So, what's the value of wslen? Is it 512? Have you checked the content of bstr (with a memory debugger so you can see the 0 chars)? – Simon Mourier May 24 '19 at 15:58
  • @SimonMourier Thanks for the quick responses, debugging the code (in another file) I can see that wslen is 1000, but bstr is just "II*" the first three charcters, I know the other characters are there since if I print the characters individually I can see their contents, maybe I missed an option. – Caesar May 24 '19 at 16:21
  • 1
    1000 is expected. How do you look at bstr then? When you say "other characters are here", what's the problem? I suspect everything is ok on the method side (bstr is buffer of 1000*2 bytes with nulls inside). Are you sure it's not on the php side or during transition (whatever that be) that you "lose" chars after null chars? – Simon Mourier May 24 '19 at 16:25
  • @SimonMourier maybe you are right the problem could lie on the php side, anyway im gonna edit my question to include the php code (like 4 lines). – Caesar May 24 '19 at 18:22
  • I don't think it's on your php code side, but I'm not sure the php interop layer supports BSTR with null bytes inside... – Simon Mourier May 24 '19 at 18:40
  • All of this code assumes the file is textual in nature, but an ACP-encoded text file should never have ANY nulls in it. So what is the actual encoding of the file? If it does have nulls, but is not something like UTF-16, which does use null bytes (and should NOT be passed to `MultiByteToWideChar()`), then the file is not really a text file, in which case why not return the file contents to PHP as a byte array instead? – Remy Lebeau May 24 '19 at 21:11
  • On a side note, why is `filepath` being passed in as `BSTR*` and not simply as `BSTR`? `BSTR` is already a pointer, and `ReadFile()` does not modify `filepath`, so the extra indirection is unnecessary (unless PHP is forcing it internally, which goes against COM memory management rules). Also, there is no need to convert `filepath` to `std::wstring`, a `BSTR` can be used as-is wherever a `LPCWSTR` is expected. And `ReadFile()` needs to `return` immediately whenever it assigns an error message to `*Ofile` (don't forget to cleanup successfully obtained Win32 resources first!). – Remy Lebeau May 24 '19 at 21:16
  • @RemyLebeau thank you, you were right it looks like I was just doing extra conversions. Like you said I'm not handling text files, but images. I tried replacing PCHAR for BYTE*, and now my returning variable is also BYTE* but it seem i can only return one byte (73 or I in this particular case). – Caesar May 25 '19 at 21:03

1 Answers1

1

I can't speak for PHP, but in COM, a BSTR is not the correct type to use for passing around binary data, use a SAFEARRAY(VT_UI1) instead:

STDMETHODIMP CMemReaderImpl::ReadFile(BSTR filepath, SAFEARRAY** Ofile)
{
    if (!Ofile)
        return E_POINTER;
    *Ofile = nullptr;

    if (!filepath)
        return E_INVALIDARG;

    HANDLE hFileMap = OpenFileMapping(FILE_MAP_READ, FALSE, filepath);
    if (!hFileMap) {
        DWORD err = GetLastError();
        return HRESULT_FROM_WIN32(err);
    }

    LPBYTE lpBuffer = (LPBYTE) MapViewOfFile(hFileMap, FILE_MAP_READ 0, 0, BUFF_SIZE);
    if (!lpBuffer) {
        DWORD err = GetLastError();
        CloseHandle(hFileMap);
        return HRESULT_FROM_WIN32(err);
    }

    SAFEARRRAYBOUND bounds;
    bounds.lLbound = 0;
    bounds.cElements = BUFF_SIZE;

    SAFEARRAY *sa = SafeArrayCreate(VT_UI1, 1, &bounds);
    if (!sa) {
        UnmapViewOfFile(lpBuffer);
        CloseHandle(hFileMap);
        return E_OUTOFMEMORY;
    }

    void *data;
    SafeArrayAccessData(sa, &data); 
    memcpy(data, lpBuffer, BUFF_SIZE);
    SafeArrayUnaccessData(sa);

    UnmapViewOfFile(lpBuffer);
    CloseHandle(hFileMap);

    *Ofile = sa;
    return S_OK;
}

I don't know if that is compatible with PHP, though.

If you must use BSTR, try SysAllocStringByteLen() to store the bytes as-is without any conversion to Unicode:

STDMETHODIMP CMemReaderImpl::ReadFile(BSTR filepath, BSTR* Ofile)
{
    if (!Ofile)
        return E_POINTER;
    *Ofile = nullptr;

    if (!filepath)
        return E_INVALIDARG;

    HANDLE hFileMap = OpenFileMapping(FILE_MAP_READ, FALSE, filepath);
    if (!hFileMap) {
        DWORD err = GetLastError();
        return HRESULT_FROM_WIN32(err);
    }

    LPSTR lpBuffer = (LPSTR) MapViewOfFile(hFileMap, FILE_MAP_READ 0, 0, BUFF_SIZE);
    if (!lpBuffer) {
        DWORD err = GetLastError();
        CloseHandle(hFileMap);
        return HRESULT_FROM_WIN32(err);
    }

    BSTR bstr = SysAllocStringByteLen(lpBuffer, BUFF_SIZE);
    if (bstr) {
        UnmapViewOfFile(lpBuffer);
        CloseHandle(hFileMap);
        return E_OUTOFMEMORY;
    }

    UnmapViewOfFile(lpBuffer);
    CloseHandle(hFileMap);

    *Ofile = bstr;
    return S_OK;
}

If that does not work for PHP, DO NOT use MultiByteToWideChar(CP_ACP) on binary data, as CP_ACP will corrupt the data! Codepage 28591 (ISO-8859-1) is a better choice to avoid corruption, as bytes encoded in ISO-8859-1 have the same numeric values as the Unicode codepoints they represent:

STDMETHODIMP CMemReaderImpl::ReadFile(BSTR filepath, BSTR* Ofile)
{
    if (!Ofile)
        return E_POINTER;
    *Ofile = nullptr;

    if (!filepath)
        return E_INVALIDARG;

    HANDLE hFileMap = OpenFileMapping(FILE_MAP_READ, FALSE, filepath);
    if (!hFileMap) {
        DWORD err = GetLastError();
        return HRESULT_FROM_WIN32(err);
    }

    LPSTR lpBuffer = (LPSTR) MapViewOfFile(hFileMap, FILE_MAP_READ 0, 0, BUFF_SIZE);
    if (!lpBuffer) {
        DWORD err = GetLastError();
        CloseHandle(hFileMap);
        return HRESULT_FROM_WIN32(err);
    }

    int wslen = MultiByteToWideChar(28591, 0, lpBuffer, BUFF_SIZE, nullptr, 0);
    if (wslen == 0) {
        DWORD err = GetLastError();
        UnmapViewOfFile(lpBuffer);
        CloseHandle(hFileMap);
        return HRESULT_FROM_WIN32(err);
    }

    BSTR bstr = SysAllocStringLen(nullptr, wslen);
    if (bstr) {
        UnmapViewOfFile(lpBuffer);
        CloseHandle(hFileMap);
        return E_OUTOFMEMORY;
    }

    MultiByteToWideChar(28591, 0, lpBuffer, BUFF_SIZE, bstr, wslen);

    UnmapViewOfFile(lpBuffer);
    CloseHandle(hFileMap);

    *Ofile = bstr;
    return S_OK;
}

Otherwise, you can simply promote each 8bit byte as-is to a 16bit character manually:

STDMETHODIMP CMemReaderImpl::ReadFile(BSTR filepath, BSTR* Ofile)
{
    if (!Ofile)
        return E_POINTER;
    *Ofile = nullptr;

    if (!filepath)
        return E_INVALIDARG;

    HANDLE hFileMap = OpenFileMapping(FILE_MAP_READ, FALSE, filepath);
    if (!hFileMap) {
        DWORD err = GetLastError();
        return HRESULT_FROM_WIN32(err);
    }

    LPBYTE lpBuffer = (LPBYTE) MapViewOfFile(hFileMap, FILE_MAP_READ 0, 0, BUFF_SIZE);
    if (!lpBuffer) {
        DWORD err = GetLastError();
        CloseHandle(hFileMap);
        return HRESULT_FROM_WIN32(err);
    }

    BSTR bstr = SysAllocStringLen(nullptr, BUFF_SIZE);
    if (!bstr) {
        UnmapViewOfFile(lpBuffer);
        CloseHandle(hFileMap);
        return E_OUTOFMEMORY;
    }

    for (int i = 0; i < BUFF_SIZE; ++i)
        bstr[i] = (OLECHAR) lpBuffer[i];

    UnmapViewOfFile(lpBuffer);
    CloseHandle(hFileMap);

    *Ofile = bstr;
    return S_OK;
}

That being said, if the above still do not work for PHP, you might need to wrap the returned SAFEARRAY/BSTR inside of a VARIANT, which is how many scripting languages generally handle COM data:

STDMETHODIMP CMemReaderImpl::ReadFile(BSTR filepath, VARIANT* Ofile)
{
    ...
    VariantInit(*Ofile);
    V_VT(*Ofile) = VT_UI1 | VT_ARRAY;
    V_ARRAY(*Ofile) = sa;
    ...
}
STDMETHODIMP CMemReaderImpl::ReadFile(BSTR filepath, VARIANT* Ofile)
{
    ...
    VariantInit(*Ofile);
    V_VT(*Ofile) = VT_BSTR;
    V_BSTR(*Ofile) = bstr;
    ...
}
Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • hey wrapping the safearray inside a variant worked, I can finally pass the data to PHP, but as a array of bytes with unsigned representation. I guess now I have to convert the array to a string, but that could take a lot of time. I wanted a fast way to pass files from c++ to php, I though memory mappe files could be the answer, but it seems like i need to go back to passing data through TCP sockets. Thank you a lot for your help – Caesar May 26 '19 at 04:15