17

Is scanf's "regex" support a standard? I can't find the answer anywhere.

This code works in gcc but not in Visual Studio:

scanf("%[^\n]",a);

It is a Visual Studio fault or a gcc extension ?

EDIT: Looks like VS works, but have to consider the difference in line ends between Linux and Windows.(\r\n)

bratao
  • 1,980
  • 3
  • 21
  • 38
  • 4
    Thanks for pointing out a feature I never knew existed; do note though that `scanf(3)` and friends are _library functions_, and not provided by `gcc` but rather by the C library (typically `glibc` on Linux systems, but there are several C libraries available). – sarnold May 14 '11 at 01:16
  • What do you mean `scanf("%[^\n]",a);` doesn't work in Visual Studio? It works for me. Post more code (specifically what's `a`), input, output, and expected output. – Michael Burr May 14 '11 at 01:25
  • Michael, sure the code compile. But it does not behave like the Linux version. My code keeps jumping with a \r\n in the buffer. I will change the filter to \r\n to check later. Thanks ! – bratao May 14 '11 at 01:42
  • 1
    This is a great quiz question for deep C library knowledge. – doug65536 Feb 02 '13 at 02:06
  • The problem seems to be that you're reading a text file in **binary mode**. In C in strings there should be only `\n` as line terminators. – Antti Haapala -- Слава Україні Aug 04 '19 at 16:43

2 Answers2

8

That particular format string should work fine in a conforming implementation. The [ character introduces a scanset for matching a non-empty set of characters (with the ^ meaning that the scanset is an inversion of the characters supplied). In other words, the format specifier %[^\n] should match every character that's not a newline.

From C99 7.19.6.2, slightly paraphrased:

The [ format specifier matches a nonempty sequence of characters from a set of expected characters (the scanset). If no l length modifier is present, the corresponding argument shall be a pointer to the initial element of a character array large enough to accept the sequence and a terminating null character, which will be added automatically.

If an l length modifier is present, the input shall be a sequence of multibyte characters that begins in the initial shift state. Each multibyte character is converted to a wide character as if by a call to the mbrtowc function, with the conversion state described by an mbstate_t object initialized to zero before the first multibyte character is converted. The corresponding argument shall be a pointer to the initial element of an array of wchar_t large enough to accept the sequence and the terminating null wide character, which will be added automatically.

The conversion specifier includes all subsequent characters in the format string, up to and including the matching right bracket ]. The characters between the brackets (the scanlist) compose the scanset, unless the character after the left bracket is a circumflex ^, in which case the scanset contains all characters that do not appear in the scanlist between the circumflex and the right bracket. If the conversion specifier begins with [] or [^], the right bracket character is in the scanlist and the next following right bracket character is the matching right bracket that ends the specification; otherwise the first following right bracket character is the one that ends the specification. If a - character is in the scanlist and is not the first, nor the second where the first character is a ^, nor the last character, the behavior is implementation-defined.

It's possible, if MSVC isn't working correctly, that this is just one of the many examples where Microsoft either don't conform to the latest standard, or think they know better :-)

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • I believe the non conforming part can be the way `a` got its content. If you open a text file in binary mode and try to process the lines, **you** have to translate the line endings yourself. – Bo Persson May 14 '11 at 09:23
  • The scanset support `"%[...]"` was in `scanf()` in 7th Edition UNIX™ in 1979. It is at least a decade older than any C standard. – Jonathan Leffler Feb 17 '13 at 23:56
6

The "%[" format spec for scanf() is standard and has been since C90.

MSVC does support it.

You can also provide a field width in the format spec to provide safety against buffer overruns:

int main()
{
    char buf[9];

    scanf("%8[^\n]",buf);

    printf("%s\n", buf);
    printf("strlen(buf) == %u\n", strlen(buf));

    return 0;
}

Also note that the "%[" format spec doesn't mean that scanf() supports regular expressions. That particular format spec is similar to a capability of regexs (and no doubt was an influenced by regex), but it's far more limited than regular expressions.

Michael Burr
  • 333,147
  • 50
  • 533
  • 760