Extracting pathname components from UNICODE_STRING within a WDK Driver using win32 and C

Question

I am trying to separate the components of a UNICODE_STRING path name in order to create the directory tree from the device root to the file leaf. This needs to be done in a WDK driver.

I need to build up the directory structure a piece at a time using ZwCreateFile() since it can only create the final directory or leaf rather than the entire path in a single call.

Apologies for such a simple question for you C engineers but I am having issues getting my head around it and utilising it in a driver.

My current approach is to convert a UNICODE_STRING to char and use the strtok_s() function to break the path name into its component directories and file.

I am looking to use

char string1[] =
    "\\Device\\HarddiskVolume";

char seps[] = "\\";
char *token1 = NULL;

char *next_token1 = NULL;

token1 = strtok_s(string1, seps, &next_token1);

But I need to convert a UNICODE_STRING to char string.

You mean the input isn't actually `string1` but some unicode string? — kabanus, Apr 10 '18 at 13:35
Apologies just updated to make sense that i am trying convert a UNICODE_STRING to char — user1403598, Apr 10 '18 at 13:42
You mean `char*`. But looking at https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/content/ntifs/nf-ntifs-ntcreatefile, I see no `char*` args. Can't you use a higher level func that deals with `wchar_t*`? — CristiFati, Apr 10 '18 at 14:06
No i cant as its a kernel driver for filesys mini filter. I am finding this a little out of my depth when i am not use to C and even C++. Although i am happy learn as i go along even though its taking me a long time. — user1403598, Apr 10 '18 at 14:22
Why not just use [wcstok_s](https://msdn.microsoft.com/en-us/library/ftsafwz3.aspx) instead of strtok_s so you don't have to do the conversion at all. Just stay in the Unicode space. — selbie, Apr 12 '18 at 07:25
Tried to use wcstok_s and i get unresolved external symbol, i go to the definition and can see it in string.h. wchar_t *wStr = path.Buffer; DbgPrint("Test Wide %ws", wStr); const wchar_t seps[] = L"\\"; wchar_t *token1 = NULL; wchar_t *next_token1 = NULL; token1 = wcstok_s(wStr, seps, &next_token1); — user1403598, Apr 12 '18 at 10:53
Is converting from Windows UTF-16 to char really appropriate since that means that most language texts of the world other than English and a few European will not be supported? And the Windows API is pretty much all UTF-16 except for a few odd spots here and there. — Richard Chambers, Apr 12 '18 at 15:06
What is the actual problem you are trying to solve with this work around? — Richard Chambers, Apr 12 '18 at 15:10
The actual problem that i am trying to solve is to use ZwCreateFile to create directories including all sub directories; rather than just the leaf directory. This is only for Windows platform. — user1403598, Apr 12 '18 at 15:12
When writing code for Windows, don't use char, use wchar_t. That goes double in a device driver. — Ben, Apr 16 '18 at 13:36

Richard Chambers · Accepted Answer · 2018-04-16T13:16:27.427

Here is an example that you can start with. The function PathBuf() walks through a string copying the parts of a pathname into a destination buffer. The function does this by being called multiple times until it reaches the end of the string.

You will need to check that this satisfies your needs and to do any additional tweaks you may need to get what you want.

I also used wchar_t in order to do my testing. You will probably need to change to UNICODE_STRING or something similar.

Notice that there are a few edge cases such as two path separators without any intervening text. Spaces should be just another character in the pathname piece.

In Windows pathnames there is network device type of syntax such as "\device\file" so you may need to add something to know whether the first piece is a device being introduced with two slashes or not.

I also made this so that it will handle either Windows pathname separators (backslash) or Linux pathname separators (forward slash) which seems to be fairly standard approach.

#include <stdlib.h>
#include <stdio.h>

wchar_t *PathBuf(wchar_t *pDest, const wchar_t *pSrc)
{
    // if not NULL source string and not the end of the source string, process it.
    if (pSrc && *pSrc) {
        short iState = 0;  // start state off as no characters found.
        do {
            // determine whether this is a path separator or a file path
            // path component text character. set the current state based
            // on the current character in the source text string.
            switch (*pSrc) {
                case L'\\':    // backslash path separator found
                case L'/':     // forward slash path separator found
                    iState = (iState == 0) ? 1 : 2;  // first instance or not?
                    break;
                default:
                    *pDest++ = *pSrc;  // copy the character from source to destination buffer
                    iState = 1;  // indicate at least one character found
                    break;
            }
            // continue the loop until either ending path separator found
            // or we have reached end of the source string.
            // we will continue on the next call after the path separator.
        } while (*pSrc && *pSrc++ && iState < 2);
    }
    *pDest = 0;   // end of string terminator for destination buffer

    return pSrc;  // return our current place in the source string
}

int testfunc(void)
{
    wchar_t *list[] = {
        L"\\state",
        L"state2",
        L"\\\\state3\\",
        L"\\statex\\state4",
        L"xx"
    };
    int i;

    for (i = 0; i < sizeof(list) / sizeof(list[0]); i++) {
        wchar_t *p1;         // pointer to source string which is updated
        wchar_t buff[128];   // destination buffer for each component
        p1 = list[i];        // start the source string with the next test item
        printf("Doing %S\n", p1);   // print out the entire test string
        while (*p1) {
            p1 = PathBuf(buff, p1);    // copy first path component into buff, update source string pointer
            printf ("  \"%S\"", buff);  // print out the path component found within double quotes
            // at this point you could use ZwCreateFile() to create the path component.
            // a sanity check on the text such as empty string may be in order.
        }
        printf("\n");
    }
}

This source will output the following:

Doing \state
  "state"
Doing state2
  "state2"
Doing \\state3\
  ""  "state3"
Doing \statex\state4
  "statex"  "state4"
Doing xx
  "xx"

See also

Directory relative ZwCreateFile

The Definitive Guide on Win32 to NT Path Conversion

Nt vs. Zw - Clearing Confusion On The Native API

Just tested and found that the printf (" \"%S\"", buff), enumerates the splitting of the path fine, but i get one more entry on the loop which comes back with ?????????. For example on L"\\statex\\state4" I get statex then state4 then ???????? - I am assuming that the buffer is empty. — user1403598, Apr 16 '18 at 12:33
@user1403598 not sure where that last entry is coming from. Is the source string containing the path zero terminated? — Richard Chambers, Apr 16 '18 at 12:46
@user1403598 see the change I made in the `while` condition to check for a zero string terminator first then increment if non-zero. Reviewing the code it looks `pSrc` would be incremented past end of string under some conditions with the previous version. Sorry about that. — Richard Chambers, Apr 16 '18 at 13:18
yes i do see and it does make sense, this is what i was trying to do but i just find the syntax difficult to get my head around, coming from C# background. — user1403598, Apr 16 '18 at 13:20
@user1403598 understandable concerning the difficulty. I am having the same sort of trouble with C++/CX with UWP as well as C++/CLI with .NET so I sympathize. I believe that change to the `while` condition will correct the problem you are seeing. add a comment if you run into anything else. It could also be that the last path component added will have two end of string terminators. That won't matter so far as string processing is concerned but leave room in your destination buffer for that eventuality. — Richard Chambers, Apr 16 '18 at 13:56

CristiFati · Answer 2 · 2019-03-22T23:09:08.153

Here's a piece of code.

code.c:

#include <Windows.h>
#include <SubAuth.h>


char* unicodeStringToPChar(UNICODE_STRING *pUS) {
    size_t wcLen = 0, cLen = 0;
    wchar_t *pWBuf = NULL;
    char *pBuf = NULL;
    errno_t res = 0;
    if (!pUS || !pUS->Length) {
        return NULL;
    }
    wcLen = pUS->Length / sizeof(wchar_t) + 1;
    pWBuf = calloc(1, wcLen * sizeof(wchar_t));
    if (!pWBuf) {
        return NULL;
    }
    if (wcsncpy_s(pWBuf, wcLen, pUS->Buffer, wcLen - 1)) {
        free(pWBuf);
        return NULL;
    }
    wcstombs_s(&cLen, NULL, 0, pWBuf, 0);
    if (!cLen) {
        free(pWBuf);
        return NULL;
    }
    pBuf = calloc(1, cLen);
    if (!pBuf) {
        free(pWBuf);
        return NULL;
    }
    res = wcstombs_s(NULL, pBuf, cLen, pWBuf, cLen - 1);
    free(pWBuf);
    if (res) {
        free(pBuf);
        return NULL;
    }
    return pBuf;
}

Notes:

Function receives a pointer to an UNICODE_STRING (don't forget to reference if you have a plain structure)
Returns a char*, NULL if it can't convert the string (whether it's empty, or some error occurred)
- You can add some output messages (e.g. printf) before return NULL; statements
- Don't forget to free the returned value once you're done with it to avoid memory leaks (in the caller)
There are 2 steps:
- "Save" the UNICODE_STRING contents in a wchar_t*:
  - Allocate the wchar_t* (calloc)
  - Copy the contents (wcsncpy_s)
  - This step might not be necessary (one could operate on the UNICODE_STRING.Buffer directly - and thus consider this step an overkill), but I wanted to be rigorous and only allocate the required number of bytes for the return value (check next item)
- Convert the wchar_t* to a char*:
  - Again, allocate the char* (calloc)
  - Perform the conversion (wcstombs_s)
    - The 1^st call to wcstombs_s is used to determine how much space is needed for the char* buffer. I used the (intermediary) wchar_t* because according to:
      - [MS.Docs]: UNICODE_STRING structure ((some) emphasis is mine):
        
        Buffer
        Pointer to a wide-character string. Note that the strings returned by the various LSA functions might not be null-terminated
      - [MS.Docs]: wcstombs_s, _wcstombs_s_l:
        
        If wcstombs_s successfully converts the source string, it puts the size in bytes of the converted string, including the null terminator, into *pReturnValue (provided pReturnValue is not NULL). This occurs even if the mbstr argument is NULL and provides a way to determine the required buffer size. Note that if mbstr is NULL*, count **is ignored.
      there are cases when that's not possible from UNICODE\_STRING.Buffer (when it doesn't end with NULL char, and contains special (wide) chars (that take 2 bytes))

I didn't test the function thoroughly, let me know how it works out for you.

After more clarification, I understood that none of the above is usable in a driver. However on [CodeGuru]: ZwCreatefile/ ZwReadfile, there's an example of using ZwCreatefile and UNICODE_STRING (via RtlInitUnicodeString and InitializeObjectAttributes) that I'm going to paste from below (didn't test anything):

#include <ntddk.h>

HANDLE handle;
NTSTATUS ntstatus;
UNICODE_STRING uniName;
OBJECT_ATTRIBUTES  objAttr;
RtlInitUnicodeString(&uniName, L"\\SystemRoot\\native.txt");
InitializeObjectAttributes(&objAttr, &uniName,
                           OBJ_CASE_INSENSITIVE | OBJ_KERNEL_HANDLE,
                           NULL, NULL);
ntstatus = ZwCreateFile(&handle,
                        GENERIC_READ,
                        &objAttr, &ioStatusBlock,
                        NULL,
                        FILE_ATTRIBUTE_NORMAL,
                        0,
                        FILE_OPEN, 
                        FILE_SYNCHRONOUS_IO_NONALERT,
                        NULL, 0);

Unfortunately i cannot use this is in a Kernel driver as the headers are not available for use #include #include — user1403598, Apr 12 '18 at 14:27
I don't understand. `UNICODE_STRING` is defined in *SubAuth.h* (*line 32* for *VStudio 2015 Community*). How is that not available? Then where do you get it from? What are you using to build your code? (didn't write a kernel driver - I assume it's a *.dll* file - extension might be also *.sys*, *.drv*). — CristiFati, Apr 12 '18 at 14:34
The UNICODE_STRING is defined in ntdef.h line 1528. (Using Visual Studio 2017 Profession) - Apologies i should have defined its a kernel driver. I think you can only access ZwCreateFile via a Kernel driver as well. All that i really trying to do is build up a list or each of the directories to call ZwCreateFile which requires a UNICODE_STRING. So there might well be an easier way rather converting to char * — user1403598, Apr 12 '18 at 14:40
Hmm, you specified it's a driver, my lack of *XP* in the area made me believe that it would be the same as for user space apps. But I asked in one of the comments whether the `wchar_t*` usage wouldn't be better, to avoid conversion — CristiFati, Apr 12 '18 at 14:49
Yeah i tried utilising wchar_t* and using wcstok_s to manipulate the path. This is also not available as i get Link error 1120 even though the header is defined in string.h along with strtok_s. Frustrating, i am a backend developer with good frontend experience but this is blowing my mind. I cant find anything to manipulate UNICODE_STRING and hence why i went down this route. — user1403598, Apr 12 '18 at 14:53
What about the last part? Most likely you'll have to modify some flags. — CristiFati, Apr 12 '18 at 18:50

Extracting pathname components from UNICODE_STRING within a WDK Driver using win32 and C

2 Answers2