0

I have a question regarding file.

The reason I am asking is because of using some function for example in R to import the data from outside to R.

I have a file character_student.txt with data like below.

Name, Age, Gender, Test1
john, 11, M, 90
betty, 25, F, 33

I am confused. Is the above file considered as csv file (comma separated file)? Or is it a text file? When using R to import say this file to R, is it appropriate to use say read.csv(file="character_student.txt)?

Then the other question that I have is, if I have a file like this:

Name Age Gender Test1
john 11 M 90
betty 25 F 33

so there is only a single space in between each file, and then say I saved it as a .csv file, then I think the filename will become something like character_text.csv. Then I am just wondering is this file now a space-delimited file or a comma-delimited file?

I guess my question is how do I know if I file is a comma-separated file? or a space-delimited file? or a tab-delimited?

Is it purely based on the name of the file? for example if the name ends with csv, then it is comma separated file, if it ends with something else then is "something else" delimited file? so does it matter what the inside of the file actually looks like? like do we have to open the file to check if there is a comma separating the field to be sure that the file is a comma-separated file? or if we have a csv file but inside of it, the field could be separated by something else?

Or if it is called csv, then every field inside is separated by a comma (like I don't have to open it to make sure that it is actually separated by comma)?

jww
  • 97,681
  • 90
  • 411
  • 885
john_w
  • 693
  • 1
  • 6
  • 25
  • 2
    By looking at it? Extensions mean nothing in Linux, and CSV and ald TAB delimited files are also text files, but w/ a special convention followed by the user. – tink Jul 17 '19 at 18:30
  • these are delimited file formats, the `.csv` extension is just a naming convention (mostly from Windows world) to map a executable to the file. It doesn't mandate the internal delimiter used. – karakfa Jul 17 '19 at 18:30
  • 1
    CSV files often use different delimiters, like tab (`\t`) and pipe (`|`). Also see [Reading Tab Delimited Data in to R](https://stackoverflow.com/q/11675917/608639), [Delimiters while writing csv files in R](https://stackoverflow.com/q/50269872/608639) and friends. – jww Jul 17 '19 at 18:30
  • 3
    And no, not every comma in CSV has to be a field delimiter, and there are varied dialects of CSV. Typically you can wrap fields with embedded commas in double quotes. `this,is,"a csv snippet, and has an embedded comma",sigh`. – tink Jul 17 '19 at 18:32
  • 2
    Your first file is what everybody would call a CSV file, even if the extension is something else. Note that like @tink says, there are dialects of CSV. In continental Europe, or in countries where the decimals are marked with a comma, the most frequent field separator is the semi-colon, `";"`. In R this would be read in with `read.csv2`, not `read.csv`. Read the help page `help("read.table")` carefully. – Rui Barradas Jul 17 '19 at 18:43

1 Answers1

1

Extensions don’t define files. They help various utilities or tools to process them in a specified way.

You write a python script and save it as hello.c.

You then pass it to gcc like gcc hello.c.

Nothing is wrong with that. gcc will accept to process the file but report lots of syntactical errors.

Similarly, by specifying .csv, you are telling the tool, utility or function that you are passing a comma separated file.

If you have a file like:

abc def, ghi jkl,

One user wants to extract data from it in the form:

abc, def,, ghi and jkl,. For that user it would be good if he “treats" it like a space separated file. For some other user who wants, abc def and ghi jkl, it would be useful for him to treat it as a comma separated file.

For a particular case, you need to study that particular function or tool and analyse the way they need the file. So yes, if a tool wants a file to be in a particular way, you need to make checks to pass the file to that tool accordingly.

Its just about how you want it.

Mihir Luthra
  • 6,059
  • 3
  • 14
  • 39