5

I recently started taking R lectures and I'm currently working on scanning files. On a worksheet, one of my questions is like:

Read the file Table6.txt, check out the file first. Notice that the information is repeated, we only want the first non-repeated ones. Make sure to create only characters not factors this time around. Lastly, we don’t want the comments.

The file is called Table6.Txt

I managed to write the code that read the table properly, but the answer sheet has an extra part inside the scan function that says flush=TRUE

My code was like:

df <- read.table("Table6.txt",skip = 1,header = TRUE,row.names = "Name",nrow
= 7,comment.char = "@",stringsAsFactors = FALSE)

And the answer sheet shows

df <- read.table("Table6.txt",skip = 1,header = TRUE,row.names = "Name",nrow
= 7,flush = TRUE,comment.char = "@",stringsAsFactors = FALSE)

What does the flush function do here? The outputs on both codes give the same dataframe.

df <- read.table("Table6.txt",skip = 1,header = TRUE,row.names = "Name",nrow
                  = 7,flush = TRUE,comment.char = "@",stringsAsFactors = FALSE)
 df
         Age Height Weight Sex
Alex      25    177     57   F
Lilly     31    163     69   F
Mark      23    190     83   M
Oliver    52    179     75   M
Martha    76    163     70   F
Lucas     49    183     83   M
Caroline  26    164     53   F
 df <- read.table("Table6.txt",skip = 1,header = TRUE,row.names = "Name",nrow
                  = 7,comment.char = "@",stringsAsFactors = FALSE)
 df
         Age Height Weight Sex
Alex      25    177     57   F
Lilly     31    163     69   F
Mark      23    190     83   M
Oliver    52    179     75   M
Martha    76    163     70   F
Lucas     49    183     83   M
Caroline  26    164     53   F
Levy
  • 55
  • 4
  • 1
    This is interesting. `read.table` uses `scan` function to do the actual scanning. The doc for etther of these says - `will flush to the end of the line after reading the last of the fields requested. This allows putting comments after the last field.` But here the comment is to be ignored anyway. And setting flush to False has no effect ether. The R source code for scan also is not very helpful because the main scan functionality is implemented in C – R.S. Dec 01 '19 at 15:48
  • 1
    So, what I was thinking is, setting comment.char as "@" already makes the program recognize the comments in the table as unnecessary and thus I won't be needing the flush arguement here. As you said, setting it to false has no effect and it just seems like a double check procedure to make sure the code runs smooth to me. Honestly, I did not understand much from the help page for read.table either. I guess I will find the solution maker and ask him directly about why he wrote that. Thanks for the answer. – Levy Dec 01 '19 at 16:39
  • Great if that is possible. Once you have found the reason, you can post it here as an answer to your own question. I will be helpful. – R.S. Dec 01 '19 at 16:57
  • I will for sure if I manage to do so. Thanks again. – Levy Dec 01 '19 at 16:59
  • When you use `debug(read.table)` you see, that it depends on `scan`. Have a look at `?scan` and the example there. But I don't get the point either. – Christoph Dec 02 '19 at 15:40

1 Answers1

1

I read the documentation at read.table and scan and this is what I understood in simple words. flush tries to make dataframe complete by ignoring extra characters if any.

For example, let's take the same data that you have shared

read.table(text = 'Age Height Weight Sex
          Alex      25    177     57   F
          Lilly     31    163     69   F
          Mark      23    190     83   M
          Oliver    52    179     75   M
          Martha    76    163     70   F
          Lucas     49    183     83   M
          Caroline  26    164     53   F', header = TRUE)

this works as expected and returns

#         Age Height Weight Sex
#Alex      25    177     57   F
#Lilly     31    163     69   F
#Mark      23    190     83   M
#Oliver    52    179     75   M
#Martha    76    163     70   F
#Lucas     49    183     83   M
#Caroline  26    164     53   F

Now let's add an extra character at the end.

read.table(text = "Age Height Weight Sex
          Alex      25    177     57   F
          Lilly     31    163     69   F
          Mark      23    190     83   M
          Oliver    52    179     75   M
          Martha    76    163     70   F
          Lucas     49    183     83   M
          Caroline  26    164     53   F A", header = TRUE)
                                         ^ #Notice this A

It gives an error

Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 7 did not have 5 elements

which makes sense since last row has an additional character in it.

We can add fill = TRUE

read.table(text = "Age Height Weight Sex
          Alex      25    177     57   F
          Lilly     31    163     69   F
          Mark      23    190     83   M
          Oliver    52    179     75   M
          Martha    76    163     70   F
          Lucas     49    183     83   M
          Caroline  26    164     53   F A", header = TRUE, fill = TRUE)

#         Age Height Weight Sex
#Alex      25    177     57   F
#Lilly     31    163     69   F
#Mark      23    190     83   M
#Oliver    52    179     75   M
#Martha    76    163     70   F
#Lucas     49    183     83   M
#Caroline  26    164     53   F
#A         NA     NA     NA    

This adds an additional row at the end by filling NA's or empty characters based on type of the column.

Now if we add flush = TRUE

read.table(text = "Age Height Weight Sex
          Alex      25    177     57   F
          Lilly     31    163     69   F
          Mark      23    190     83   M
          Oliver    52    179     75   M
          Martha    76    163     70   F
          Lucas     49    183     83   M
          Caroline  26    164     53   F A", header = TRUE, flush = TRUE)

#         Age Height Weight Sex
#Alex      25    177     57   F
#Lilly     31    163     69   F
#Mark      23    190     83   M
#Oliver    52    179     75   M
#Martha    76    163     70   F
#Lucas     49    183     83   M
#Caroline  26    164     53   F

It ignores the additional "A" at the end, considers it as a comment and makes a complete dataframe.


In your case, this did not make any difference in the final output since your data was complete and did not have any incomplete information. You can consider this as one of safe programming practices to follow in case you are reading data whose structure you are not aware of.

Hope this clarified a bit.


As commented by @Christoph, here is an example to demonstrate the difference between comment.char and flush

read.table(text = 'Age Height Weight Sex
          Alex      25    177     57   F
          Lilly     31    163     69   F
          Mark      23    190     83   M
          Oliver    52    179     75   M
          Martha    76    163     70   F
        @ Lucas     49    183     83   M 
          Caroline  26    164     53   F @', header = TRUE,flush = TRUE)

#           Age Height Weight Sex
#Alex        25    177     57   F
#Lilly       31    163     69   F
#Mark        23    190     83   M
#Oliver      52    179     75   M
#Martha      76    163     70   F
#@        Lucas     49    183  83
#Caroline    26    164     53   F


read.table(text = 'Age Height Weight Sex
          Alex      25    177     57   F
          Lilly     31    163     69   F
          Mark      23    190     83   M
          Oliver    52    179     75   M
          Martha    76    163     70   F
        @ Lucas     49    183     83   M 
          Caroline  26    164     53   F @', header = TRUE,comment.char = '@')

#         Age Height Weight Sex
#Alex      25    177     57   F
#Lilly     31    163     69   F
#Mark      23    190     83   M
#Oliver    52    179     75   M
#Martha    76    163     70   F
#Caroline  26    164     53   F

With flush = TRUE @ present at the beginning of second last line is not ignored instead the last character (M) was ignored. However, with comment.char we can ignore the exact characters present at any part of the text.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • But if you change the last line to `...Caroline 26 164 53 F @A", header = TRUE, comment.char = "@")`, it also works without flush. So why do you need flush at all? – Christoph Dec 10 '19 at 19:46
  • @Christoph I added an example in the answer to explain the difference between `flush` and `comment.char`. – Ronak Shah Dec 13 '19 at 06:05