1

I'm actually trying to make my own server Textual User Interface (in order to manage FTP, SSH connection, Task Manager, etc). My problem here is on the task manager

In order to save my tasks I've decided to write all of them in a file. I want each line (corresponding to a task) looking like :

Year Month Day Week-Day Hour Min Second ; Command

In order to be easier, i used same process as cron where * is equivalent to any moment of the corresponding category

* * * * 00 00 00 ; reboot //allow me to run reboot everyday at midnight

In order to do so, I've decided to use POSIX regex. I want it to format :

YEAR [0-9] {1-9}
MONTH [0-9] {2}
DAY [0-9] {2}
WEEK-DAY [A-Z] [a-z] {3}
HOUR [0-9] {2}
MINUTE [0-9] {2}
SECOND [0-9] {2}

COMMAND can be any printable character

This leads me to an issue. I've been able to create this regex :

char *regexString = "^(\\*|([[:digit:]]){1,9})[[:blank:]](\\*|([[:digit:]]){2})[[:blank:]](\\*|([[:digit:]]){2})[[:blank:]](\\*|([[:alpha:]]){3})[[:blank:]](\\*|([[:digit:]]){2})[[:blank:]](\\*|([[:digit:]]){2})[[:blank:]](\\*|([[:digit:]]){2})[[:blank:]];[[:blank:]]([[:print:]])*";

It seems it was working but when I tried to use this found here to understand how I could get each component, this leads me to :

Output :
Match 0, Group 0: [ 0-25]: * * * * 00 00 00 ; reboot
Match 0, Group 1: [ 0- 1]: *

Can you help me to understand ? Thanks (:

PS : This is some examples :

* * * * * * * ; command //Match
0 00 00 Mon 00 00 00 ; command //Match
123456789 00 00 Mon 00 00 00 ; command //Match

01234556789 00 00 Mon 00 00 00 ; command //Don't Match
0 00 00 0 00 00 00 ; command //Don't Match
0 0 0 Mon 0 0 0 ; command //Don't Match

EDIT : Here is the code I use

#include <stdio.h>
#include <string.h>
#include <regex.h>

int main ()
{
    char * source = "* * * * 00 00 00 ; reboot";
    char *regexString = "^(\\*|([[:digit:]]){1,9})[[:blank:]](\\*|([[:digit:]]){2})[[:blank:]](\\*|([[:digit:]]){2})[[:blank:]](\\*|([[:alpha:]]){3})[[:blank:]](\\*|([[:digit:]]){2})[[:blank:]](\\*|([[:digit:]]){2})[[:blank:]](\\*|([[:digit:]]){2})[[:blank:]];[[:blank:]]([[:print:]])*";
    size_t maxMatches = 3; //I've tried for sevrals values, 2, 3 ... same Output
    size_t maxGroups = 3; //I've tried for sevrals values, 2, 3 ... same Output

    regex_t regexCompiled;
    regmatch_t groupArray[maxGroups];
    unsigned int m;
    char * cursor;

    if (regcomp(&regexCompiled, regexString, REG_EXTENDED))
    {
        printf("Could not compile regular expression.\n");
        return 1;
    };

    m = 0;
    cursor = source;
    for (m = 0; m < maxMatches; m ++)
    {
        if (regexec(&regexCompiled, cursor, maxGroups, groupArray, 0))
            break;  // No more matches

        unsigned int g = 0;
        unsigned int offset = 0;
        for (g = 0; g < maxGroups; g++)
        {
            if (groupArray[g].rm_so == (size_t)-1)
                break;  // No more groups

            if (g == 0)
                offset = groupArray[g].rm_eo;

            char cursorCopy[strlen(cursor) + 1];
            strcpy(cursorCopy, cursor);
            cursorCopy[groupArray[g].rm_eo] = 0;
            printf("Match %u, Group %u: [%2u-%2u]: %s\n",
                   m, g, groupArray[g].rm_so, groupArray[g].rm_eo,
                   cursorCopy + groupArray[g].rm_so);
        }
        cursor += offset;
    }

    regfree(&regexCompiled);

    return 0;
}

Exemples Outputs :

//Case of a match :
Output :
Match 0, Group 0: [ 0-25]: * * * * 00 00 00 ; reboot
Match 0, Group 1: [ 0- 1]: * // YEAR
Match 0, Group 2: [ 2- 3]: * // MONTH
Match 0, Group 3: [ 4- 5]: * // DAY
Match 0, Group 4: [ 6- 7]: * // WEEK-DAY
Match 0, Group 5: [ 8- 10]: 00 //HOUR
Match 0, Group 6: [ 11- 13]: 00 //MINUTE
Match 0, Group 7: [ 14- 16]: 00 // SECOND
Match 0, Group 8: [ 20- 25]: reboot //COMMAND
$> echo $?
0

//Case of a match :
Output :
Match 0, Group 0: [ 0-38]: 123456789 00 00 Mon 00 00 00 ; Command
Match 0, Group 1: [ 0- 9]: 123456789 //YEAR
Match 0, Group 2: [ 10- 12]: 00 //MONTH
Match 0, Group 3: [ 13- 15]: 00 //DAY 
Match 0, Group 4: [ 16- 19]: Mon //WEEK-DAY
Match 0, Group 5: [ 20- 22]: 00 //HOUR
Match 0, Group 6: [ 23- 25]: 00 //MINUTE
Match 0, Group 7: [ 26- 28]: 00 //SECOND
Match 0, Group 8: [ 31- 38]: Command //COMMAND
$> echo $?
0

//Case of Not Match
$> echo $?
0
Dzious
  • 169
  • 2
  • 14
  • what regex library are you using? – virolino Oct 17 '19 at 10:41
  • What is it exactly you don't understand? The output seems reasonable. The first group is the whole match, followed by each parenthesized group. Note that the code example sets `maxMatches = 2` which you should change to see all matches. – nwellnhof Oct 17 '19 at 10:42
  • @virolino ```regex.h``` the basic C regex library – Dzious Oct 17 '19 at 10:49
  • You are abusing capturing groups, remove the repeated ones. Add `$` at the end. `^(\*|[[:digit:]]{1,9})[[:blank:]](\*|[[:digit:]]{2})[[:blank:]](\*|[[:digit:]]{2})[[:blank:]](\*|[[:alpha:]]){3}[[:blank:]](\*|[[:digit:]]{2})[[:blank:]](\*|[[:digit:]]{2})[[:blank:]](\*|[[:digit:]]{2})[[:blank:]];[[:blank:]][[:print:]]*$`. See https://regex101.com/r/77fXWl/1. Now, 1) post the code you are using, 2) provide exact output for each sample input. It is not clear what you are doing because `get each component` and `Match/Don't match` imply different types of output. – Wiktor Stribiżew Oct 17 '19 at 10:51
  • You need to show respect code compiling the regex and the code using it. Please create an [mre]. – Jonathan Leffler Oct 17 '19 at 10:51
  • @nwellnhof Sorry I forgot to say so, I've change the ```maxMatches``` and whatever the number is I've the same output What I do not understand is how I would be able to Pick each of my parenthesized group – Dzious Oct 17 '19 at 10:53
  • @WiktorStribiżew I've add the code i used, I also add some outputs i'm trying to have by ```get each component``` i mean trying to get Year, Month , ... When i Say Match it's relative to the regex, the string should be recognized as being part of the regex . I can't put the ```$``` at the end, i'll probably have a ```\n``` at the end of my string (before ```\0```) – Dzious Oct 17 '19 at 11:34
  • Does adding the code i used is enough @JonathanLeffler ? – Dzious Oct 17 '19 at 11:37
  • 1
    That's good. Details like `REG_EXTENDED` are important and were not stated in the question previously. – Jonathan Leffler Oct 17 '19 at 18:24

1 Answers1

2

You should be careful when setting the maxGroups variable. Its value is the sum of all capturing groups in the pattern + 1 (the whole match value, the first item).

You should get rid of all redundant capturing groups and use

char *regexString = "^(\\*|[[:digit:]]{1,9})[[:blank:]](\\*|[[:digit:]]{2})[[:blank:]](\\*|[[:digit:]]{2})[[:blank:]](\\*|[[:alpha:]]{3})[[:blank:]](\\*|[[:digit:]]{2})[[:blank:]](\\*|[[:digit:]]{2})[[:blank:]](\\*|[[:digit:]]{2})[[:blank:]];[[:blank:]]([[:print:]]*)";

The regex (see its demo) now has 8 capturing groups, so set maxGroups value to 9:

 size_t maxGroups = 9; // 8 groups + 1 for whole match

And your code should work, see the online demo.

It may turn out useful to increase the maxMatches to the value that is close or a little above the number of expected matches.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 1
    Thanks ! In Fact, i'll send my function only one line by one, so, i wont have multiples matches. But in order to understand i used (and kept) this code Once again thanks for your help and explanations :) – Dzious Oct 17 '19 at 11:51