Here is one possible algorithm. It is not necessarily well-optimized as presented here, but exists to demonstrate one possible implementation of an algorithm. It is intentionally partially abstract.
The following is a very robust O(n) time algorithm you may use to trim whitespace (among other things if you generalize and extend it).
This implementation has not been verified to work as-is, however.
You should track the previous character and relevant spaces so that if you see { ',', ' ' }
or { CHAR_IN_ALPHABET, ' '}
, you begin a chain, and a value representing the current path of execution. When you see any other character, the chain should break if the first sequence, and vice versa if the second sequence is detected. We'll be defining a function:
// const char *const in: indicates intent to read from in only
void trim_whitespace(const char *const in, char *out, uint64_t const out_length);
We are defining a definite algorithm in which all execution paths are known, so for each unique possible state of execution, you should assign a numeric value increasing linearly beginning from zero using enums defined within the function for readability, and switch statements (unless goto and labels better models the behavior of the algorithm):
void trim_whitespace(const char *const in, char *out, uint64_t const out_length) {
// better to use ifdefs first or avoid altogether with auto const variable,
// but you get the point here without all that boilerplate
#define CHAR_NULL 0
enum {
DEFAULT = 0,
WHITESPACE_CHAIN
} execution_state = DEFAULT;
// track if loop is executing; makes the logic more readable;
// can also detect environment instability
// volatile: don't want this to be optimized out of existence
volatile bool executing = true;
while(executing) {
switch(execution_state) {
case DEFAULT:
...
case WHITESPACE_CHAIN:
...
default:
...
}
}
function_exit:
return;
// don't forget to undefine once finished so another function can use
// the same macro name!
#undef CHAR_NULL
}
The number of possible execution states is equal to 2**ceil(log_2(n))
where n
is the number of actual execution states relevant to the operation of the current algorithm. You should explicitly name them and make cases for them in the switch statement.
In the DEFAULT
case, we're only checking for commas and "legal" characters. If the previous character was a comma or legal character, and the current character is a space, then we want to set the state to WHITESPACE_CHAIN
.
In the WHITESPACE_CHAIN
case, we test if the current chain can be trimmed based on whether the character we began with was a comma or legal character. If the current character can be trimmed, it is simply skipped and we go to the next iteration until we hit another comma or legal character depending on what we're looking for, then set the execution state to DEFAULT
. If we determine this chain to not be trimmable, then we add all the characters we skipped and set the execution state back to DEFAULT
.
The loop should look something like this:
...
// black boxing subjectives for portability, maintenance, and readability
bool is_whitespace(char);
bool is_comma(char);
// true if the character is allowed in the current context
bool is_legal_char(char);
...
volatile bool executing = true;
// previous character (only updated at loop start, line #LL)
char previous = CHAR_NULL;
// current character (only updated at loop start, line #LL)
char current = CHAR_NULL;
// writes to out if true at end of current iteration; doesn't write otherwise
bool write = false;
// COMMA: the start was a comma/delimeter
// CHAR_IN_ALPHABET: the start was a character in the current context's input alphabet
enum { COMMA=0, CHAR_IN_ALPHABET } comma_or_char = COMMA;
// current character index (only updated at loop end, line #LL)
uint64_t i = 0, j = 0;
while(executing) {
previous = current;
current = in[i];
if (!current) {
executing = false;
break;
}
switch(execution_state) {
case DEFAULT:
if (is_comma(previous) && is_whitespace(current)) {
execution_state = WHITESPACE_CHAIN;
write = false;
comma_or_char = COMMA;
} else if (is_whitespace(current) && is_legal_char(previous)) { // whitespace check first for short circuiting
execution_state = WHITESPACE_CHAIN;
write = false;
comma_or_char = CHAR_IN_ALPHABET;
}
break;
case WHITESPACE_CHAIN:
switch(comma_or_char) {
case COMMA:
if (is_whitespace(previous) && is_legal_char(current)) {
execution_state = DEFAULT;
write = true;
} else if (is_whitespace(previous) && is_comma(current)) {
execution_state = DEFAULT;
write = true;
} else {
// illegal condition: logic error, unstable environment, or SEU
executing = true;
out = NULL;
goto function_exit;
}
break;
case CHAR_IN_ALPHABET:
if (is_whitespace(previous) && is_comma(current) {
execution_state = DEFAULT;
write = true;
} else if (is_whitespace(previous) && is_legal_char(current)) {
// abort: within valid input string/token
execution_state = DEFAULT;
write = true;
// make sure to write all the elements we skipped;
// function should update the value of j when finished
write_skipped(in, out, &i, &j);
} else {
// illegal condition: logic error, unstable environment, or SEU
executing = true;
out = NULL;
goto function_exit;
}
break;
default:
// impossible condition: unstable environment or SEU
executing = true;
out = NULL;
goto function_exit;
}
break;
default:
// impossible condition: unstable environment or SEU
executing = true;
out = NULL;
goto function_exit;
}
if (write) {
out[j] = current;
++j;
}
++i;
}
if (executing) {
// memory error: unstable environment or SEU
out = NULL;
} else {
// execution successful
goto function_exit;
}
// end of function
Please kindly also use the word whitespace to describe these characters as that is what they are commonly known as, not "white chars".