Read specific section of a line in a formatted file with low level functions

Question

I am trying to build an authentication system using C programming Language. I have already wrote the code for the functions to take user input (username & password) and to inset it into the database (a .txt file) in the following formatted way:

ID    USERNAME    PASSWORD
...   ...         ...
...   ...         ...
...   ...         ...
EOF(just showing the end of the file for the sake or question comprehensibility)

Between each string there is a \t char.

To make sure the ID (which is pseudo-random generated), the username and the password do not have duplicate inside the database I want to write three functions able to read just the id, just the username and just the password, then compare the result of each with the users input, returning values according to the result of the reading, but I don't know the correct way to do it using low level functions (read(), lseek());

To be sure we are on the same page: I don't want one of you to write code for me, this is unethical and will remove the fun from writing the whole thing by myself, I would just like some hint that will make me understand in which directions the algorithm should go.

Unrelated: allow different people to have the same password. — pmg, Mar 22 '22 at 19:50
Can you please be more specific? At the most basic level it's just `read` into a buffer and parse the data. There are many ways to parse the data. Could check one character at a time and use `\t` to find each of the parts of a line and `\n` to find the end of a line. Or make the data into a string and use functions such as `strtok` to parse it. Without an actual code attempt it is difficult to know what specifically you need help with. — kaylum, Mar 22 '22 at 19:51
`lseek` won't be useful unless you guarantee that all the lines are exactly the same length, and each of the three fields starts at a fixed offset from the beginning of the line. — user3386109, Mar 22 '22 at 19:55
If you disallow a duplicate password then you have effectively revealed someone else's password. Another collision-saver could be with the ID. If it is part sequential (so you can locate a record without searching) and part pseudo-random, then there can't be any previous use, and you only need to track the number of users. — Weather Vane, Mar 22 '22 at 19:56
@WeatherVane could you explain be better the meaning of "part sequential and part pseudo-random". I have tried to worked out a solution by myself but I just want to be sure. [My idea: supposing the ID is formed by three number (`nnnn`), than the first two `nn` could represent the actually cardinal of the user and the last two `n`s are generate. But in these, and other possibilities a very small amount of users credential could be saved into the database ] — , Mar 23 '22 at 16:22
For example SSSSSRRRR where SSSSS is a sequential sequence and RRRR is a pseudo-random sequence. Each will be unique without cross-checking. You can make it less obvious with say, SRSRSRSRS. This example allows 100000 users. An ID need not be a "number", it could be a string, so there isn't really any limit to the number of users apart from what the string length gives, and, if it's a text file, that isn't a limit either. — Weather Vane, Mar 23 '22 at 16:26

Marcus Müller · Answer 1 · 2022-03-22T20:01:45.507

… and the password do not have duplicate …

I hope you mean IDs, not passwords. You must never tell a user that their password is already in the database! That means they now just have to try all other user names (which might be easy to guess! Anyway, easier to guess than a password) with the password they've tried to set for themselves.

By the way, I'm assuming this is a learning experience, not a production system. In anything that actually handles user logins, you do not ever store passwords, but salted hashes of passwords. That way, someone that gets your database file still can't authenticate with that – because your system doesn't accept hashes, it accepts passwords and calculates the hashes and checks them against the database.
(If this was a production system, you'd also gladly use a well-tested library to manage your data, because then you don't have to worry about your own bugs, or making sure two concurrent processes don't try to write to the file the same time and corrupt it. It might sound a bit like overkill, but sqlite would be such a system where you can trivially make a compact, safe-to-use system and use the built-in hashing functions to store and check password hashes. It's really ubiquitous!)

but I don't know the correct way to do it using low level functions (read(), lseek());

You can't solve this using seek/lseek, because your text file has variable line length – before reading a line completely, you can't know where the next line starts.

So, use higher-level functions to read tab-separated strings.

The way forward here is to scanf each line, to get ID, USERNAME and PASSWORD, ignore the password, and check against what the user entered.

Additionally, common practice is to get userid/login and password. Then, _always_ compute the password hash. Then reject the login if _either_ the hash is bad or the username is invalid. This prevents timing attacks to determine whether the username is valid. That is, if we _only_ computed the hash _if_ the username is valid, the shortened response time "leaks" the username as valid. — Craig Estey, Mar 22 '22 at 20:25

Read specific section of a line in a formatted file with low level functions

1 Answers1