4

I am working on C/UNIX and have an input file having number of records. I have mapped each record to a structure and writing the structure to an output file by adding missing information in a record from database.

My issue is with writing the structure(consisting of character arrays) back to the file. I am using

    fwrite(&record, sizeof(record), 1, out);
    fwrite("\n", 1, 1, outfd);

This will write the data in the output file with a terminating NULL '\0' after each member. Please let me know how can I write this structure to the file without that terminating '\0' after each member.

Sachin
  • 20,805
  • 32
  • 86
  • 99
  • 2
    Please provide the structure definition so we know what you are working with. Also, what you are trying to do with the data you just wrote out. If it is being written into another C program, this might be a feature. – Seth Robertson Jun 21 '11 at 17:56
  • 2
    Be aware that doing it that way means that your format won't be portable. – jamesdlin Jun 21 '11 at 18:06

2 Answers2

5

I would imagine that those 0's are part of the character arrays -- they're at the end of every C string. If you need to write the strings to a file without the zeros, you could write the individual char arrays, writing only the characters and not the trailing zero (you might use strlen() to find this length), ie.,

fwrite(theCharArray, 1, strlen(theCharArray), out);

But then you may need to write some information about the length of each string to the file.

Ernest Friedman-Hill
  • 80,601
  • 10
  • 150
  • 186
  • Thanks Ernest, yes thats true, those trailing '\0' are part of character arrays. Is writing individual strings is the only solution. The structure is very big and have 100 members. Please suggest. – Sachin Jun 21 '11 at 17:49
  • 1
    @Jerry Coffin suggests using an array of values computed with `offsetof` and a loop; that would actually lead to *more*, albeit slightly simpler, lines of code. One crazy thing I can think of, is if the `struct` consists solely of char arrays, then you could have a for loop that iterated `sizeof(struct)` times, advancing a pointer at each step, and writing the non-zero characters to the file, one at a time. The whole thing would be only a few lines of code, and it would be robust to modifications of the struct. The downside is that the performance would probably not be as good. – Ernest Friedman-Hill Jun 21 '11 at 19:42
3

This will write a record out exactly as it's stored in memory -- but the compiler is free ti insert padding between the members, and if it does, this will write out whatever values happen to be in those padding bytes.

Many (most?) compilers have non-portable ways of preventing them from inserting that padding -- MSVC uses #pragma pack(1), gcc uses __attribute(__packed__) (and at least some versions support the #pragma pack syntax as well).

It's also possible that you've defined record to include some zero bytes as part of the data (e.g., arrays of char with zero terminators to make them strings). Since you haven't shown the definition of record, it's hard to guess whether this applies or not though.

Edit: based on your comment, it appears that the latter is the case. The first point I'd make is that removing these may not be a good idea. If you remove them, you'll have to do something to let a program reading the data know where one field ends and the next one begins (unless the fields are fixed width, which can be handled implicitly).

The most obvious possibility is to precede each field with its length. This has the advantage that if/when you want to seek through the file, you can get from one field to the next without reading through the data to find the terminating byte. Usually, however, I'd use an index instead -- a file containing the file offsets to successive records in the data (and possibly some key data for each record, so you can quickly search based on the contents of the records), so you can quickly seek to the location of a record and read its data. Unless you have extraordinarily large fields, seeking to individual fields rarely accomplishes much though.

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
  • is writing individual members is the only solution to get read of this – Sachin Jun 21 '11 at 17:51
  • 1
    @Sachin Chourasiya: pretty much, yes. You can keep that from being quite so onerous, however, by using `offsetof` to create an array of offsets to the beginnings of the fields, so you can write the fields in a loop instead of 100 separately written calls to `fwrite` (or puts, etc.) See: http://stackoverflow.com/questions/5088358/breaking-a-single-string-into-multiple-strings-c/5088496#5088496 for an example. – Jerry Coffin Jun 21 '11 at 18:00
  • @Sachin: Imagine you can take the `'\0'` (and the following bytes) out. How are you going to identify the beginning of the next member? – pmg Jun 21 '11 at 18:22