3

I apologize if this has been asked previously, but I haven't been able to find an example online or elsewhere.

I have very dirty data file in a text file (it may be JSON). I want to analyze the data in R, and since I am still new to the language, I want to read in the raw data and manipulate as needed from there.

How would I go about reading in JSON from a text file on my machine? Additionally, if it isn't JSON, how can I read in the raw data as is (not parsed into columns, etc.) so I can go ahead and figure out how to parse it as needed?

Thanks in advance!

csgillespie
  • 59,189
  • 14
  • 150
  • 185
Btibert3
  • 38,798
  • 44
  • 129
  • 168
  • 1
    It might be a good idea to include a sample if possible. I see a whole range of possibilities, going from using rjson to using a combination of scan() or readLines() with regular expressions, depending on whether it's json or not. and on a sidenote, how to read in json files has been answered numerous times already on this site. If that's your question, this should be closed. – Joris Meys Sep 27 '10 at 13:58

3 Answers3

2

Suppose your file is in JSON format, you may try the packages jsonlite ou RJSONIO or rjson. These three package allows you to use the function fromJSON.

To install a package you use the install.packages function. For example:

install.packages("jsonlite")

And, whenever the package is installed, you can load using the function library.

library(jsonlite) 

Generally, the line-delimited JSON has one object per line. So, you need to read line by line and collecting the objects. For example:

con <- file('myBigJsonFile.json') 
open(con)
objects <- list()
index <- 1
while (length(line <- readLines(con, n = 1, warn = FALSE)) > 0) {
    objects[[index]] <- fromJSON(line)
    index <- index + 1
} 
close(con)

After that, you have all the data in the objects variable. With that variable you may extract the information you want.

Genaro Costa
  • 629
  • 5
  • 3
2

Use the rjson package. In particular, look at the fromJSON function in the documentation.

If you want further pointers, then search for rjson at the R Bloggers website.

csgillespie
  • 59,189
  • 14
  • 150
  • 185
2

If you want to use the packages related to JSON in R, there are a number of other posts on SO answering this. I presume you searched on JSON [r] already on this site, plenty of info there.

If you just want to read in the text file line by line and process later on, then you can use either scan() or readLines(). They appear to do the same thing, but there's an important difference between them.

scan() lets you define what kind of objects you want to find, how many, and so on. Read the help file for more info. You can use scan to read in every word/number/sign as element of a vector using eg scan(filename,""). You can also use specific delimiters to separate the data. See also the examples in the help files.

To read line by line, you use readLines(filename) or scan(filename,"",sep="\n"). It gives you a vector with the lines of the file as elements. This again allows you to do custom processing of the text. Then again, if you really have to do this often, you might want to consider doing this in Perl.

Joris Meys
  • 106,551
  • 31
  • 221
  • 263