As Heto points out in the comments, the main bottleneck here is probably going to be reading the file from disk, not whichever scanf
function variant you decide to use.
If you really want to speed up your application, you should try to build a pipeline. As you're describing the application now, you'd basically be working in 2 phases: reading the file into a buffer, and parsing words from the buffer.
Here's what the activity might look like if you decide to read the whole file into a string, and then use sscanf
on the string:
reading: ████████████████
parsing: ████████████████
You get something a little different if you use fscanf
directly on the file, since you're constantly switching between reading and parsing:
reading: █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █
parsing: █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █
In both cases, you end up taking about the same amount of time.
However, if you can do your file i/o asynchronously, then you can overlap the time waiting for data from the disk with the time used to compute. Ideally, you'd end up with something like this:
reading: ████████████████
parsing: ████████████████
My diagrams might not be that accurate (we already pointed out that pointed out that parsing should take much less time than the i/o, so the two bars really shouldn't be the same length)—but you should get the general idea. If you can set up a pipeline where data is read in asynchronously from the processing, then you can get a big speedup by overlapping the communication (reads from disk) and computation (parsing).
You could achieve an asynchronous pipeline like this using POSIX asynchronous I/O (aio), or just doing a simple producer/consumer setup with two threads (where one reads from the file and other other does the parsing).
Honestly though, unless you're processing massive text files, you're probably barely even going to be able to measure the difference in speed among any of the possible approaches you might choose...
This pipelining approach is more applicable when you're doing something more compute intensive (not just scanning characters), and your communication delay is higher (like when the data is coming over the network instead of from a local disk). However, it would still be a good exercise to explore the different options. After all, the assignment is contrived anyway—the point is to learn something useful that you might be able to use in a real project sometime later, right?
On a separate note, using any of the scanf
will probably be slower than just looping over your buffers to extract strings of characters [A-Za-z]
. This is because, with any of the scanf
functions, the code first needs to parse your format string to figure out what you're looking for, and then actually parse the input. Sometimes compilers can do smart things—like how gcc usually changes a printf
with no format specifiers into a puts
instead—but I don't think there are optimizations like that for scanf
and friends, especially if you're using something special like %[A-Za-z]
instead of a standard format specifiers like %d
.