CamelCase to snake_case in C without tolower

Question

I want to write a function that converts CamelCase to snake_case without using tolower.

Example: helloWorld -> hello_world

This is what I have so far, but the output is wrong because I overwrite a character in the string here: string[i-1] = '_';. I get hell_world. I don't know how to get it to work.

void snake_case(char *string)
{
    int i = strlen(string);
    while (i != 0)
    {
        if (string[i] >= 65 && string[i] <= 90)
        {
            string[i] = string[i] + 32;
            string[i-1] = '_';
        }
        i--;
    }
}

**Assuming the underlying array has enough space**, you need to move letters forwards to make space for the `'_'`. Use `memmove()` because both `memcpy()` and `strcpy()` invoke UB when called with addresses within the same array. **Otherwise** you need to `malloc()` (and/or `realloc()`) — pmg, Sep 10 '21 at 11:49
I would probably do this in two passes: one for calculating the number of upper-case letters == underscores to insert, and then one where the characters are processed/copied to a new buffer allocated using the size calculated in the first step. — 500 - Internal Server Error, Sep 10 '21 at 11:58

score 0 · Answer 1 · answered Sep 10 '21 at 12:33

This conversion means, aside from converting a character from uppercase to lowercase, inserting a character into the string. This is one way to do it:

iterate from left to right,
if an uppercase character if found, use memmove to shift all characters from this position to the end the string one position to the right, and then assigning the current character the to-be-inserted value,
stop when the null-terminator (\0) has been reached, indicating the end of the string.

Iterating from right to left is also possible, but since the choice is arbitrary, going from left to right is more idiomatic.

A basic implementation may look like this:

#include <stdio.h>
#include <string.h>

void snake_case(char *string)
{
    for ( ; *string != '\0'; ++string)
    {
        if (*string >= 65 && *string <= 90)
        {
            *string += 32;
            memmove(string + 1U, string, strlen(string) + 1U);
            *string = '_';
        }
    }
}

int main(void)
{
    char string[64] = "helloWorldAbcDEFgHIj";
    snake_case(string);
    printf("%s\n", string);
}

Output: hello_world_abc_d_e_fg_h_ij

Note that:

The size of the string to move is the length of the string plus one, to also move the null-terminator (\0).
I am assuming the function isupper is off-limits as well.
The array needs to be large enough to hold the new string, otherwise memmove will perform invalid writes!

The latter is an issue that needs to be dealt with in a serious implementation. The general problem of "writing a result of unknown length" has several solutions. For this case, they may look like this:

First determine how long the resulting string will be, reallocating the array, and only then modifying the string. Requires two passes.
Every time an uppercase character is found, reallocate the string to its current size + 1. Requires only one pass, but frequent reallocations.
Same as 2, but whenever the array is too small, reallocate the array to twice its current size. Requires a single pass, and less frequent (but larger) reallocations. Finally reallocate the array to the length of the string it actually contains.

In this case, I consider option 1 to be the best. Doing two passes is an option if the string length is known, and the algorithm can be split into two distinct parts: find the new length, and modify the string. I can add it to the answer on request.

CamelCase to snake_case in C without tolower

1 Answers1