A recently discovered explanation for GTA lengthy load times(1) showed that many implementations of sscanf()
call strlen()
on their input string to set up a context object for an internal routine shared with other scanning functions (scanf()
, fscanf()
...). This can become a performance bottleneck when the input string is very long. Parsing a 10MB JSON file loaded as a string with repeated calls to sscanf()
with an offset and a %n
conversion proved to be a dominant cause for the load time.
My question is should sscanf()
even read the input string beyond the bytes necessary for the conversions to complete? For example does the following code invoke undefined behavior:
int test(void) {
char buf[1] = { '1' };
int v;
sscanf(buf, "%1d", &v);
return v;
}
The function should return 1
and does not need to read more than one byte from buf
, but is sscanf()
allowed to read from buf
beyond the first byte?
(1) references provided by JdeBP:
https://nee.lv/2021/02/28/How-I-cut-GTA-Online-loading-times-by-70/
https://news.ycombinator.com/item?id=26297612
https://github.com/biojppm/rapidyaml/issues/40