6

I have a data with a column as json string:

reservation  reasons
1592         [{"name"=>"jorge", "value"=>"MX"}, {"name"=>"Billing phone number", "value"=>"1123"}, {"name"=>"BillingCountry", "value"=>"USA"}]
1597         [{"name"=>"BillingAddress_Country", "value"=>"IN"}, {"name"=>"Billing phone number country code", "value"=>"IN"}, {"name"=>"Latest amount", "value"=>"583000000"}]

I want to parse the column as follows:

reservation   name                                value
1592          jorge                                mx
1592          Billing phone number                 1123
1592          BillingCountry                       USA
1597          BillingAddress_Country               IN
1597          Billing phone number country code    IN
1597          Latest amount                        583000000

I am us ing jsonlite in R. I am getting following error in my code:

data<-read.csv("data.csv")
json<-data$reasons
mydf <- fromJSON(json)
Error: Argument 'txt' must be a JSON string, URL or file.

Can anyone tell me where am I making mistakes? What modifications do I need to do? Many thanks in advance!

user3642360
  • 762
  • 10
  • 23
  • 1
    can you `dput(head(dat))`? – Onyambu Jul 02 '18 at 20:57
  • 1
    Are you sure this is JSON? It doesn't look that much like JSON. What's this `=>`? I think you need to reformat it to JSON first. I can show you how in an answer. – Hack-R Jul 02 '18 at 21:05
  • Sorry but I am not getting it. How dput(head(dat)) can solve the problem? – user3642360 Jul 02 '18 at 21:05
  • 2
    That is not a solution to the problem, that's how to reproducibly share your data. Please see the `r` tag description for more info on that. – Hack-R Jul 02 '18 at 21:06
  • This doesn't appear to be valid JSON. Where did it come from? It looks like maybe a Ruby hash – camille Jul 02 '18 at 21:50
  • I just search in the internet and saw its not in json. I am trying to change the title so that we don't get confused – user3642360 Jul 02 '18 at 21:53

2 Answers2

4

This doesn't look like normal JSON to me (or to fromJSON, which makes me feel a little better). Maybe it's some special case of it or something (?). Update: @camille identified it as Ruby Hash.

In any event, we can fix it:

reasons <-  '{"name"=>"jorge", "value"=>"MX"}, {"name"=>"Billing phone number", "value"=>"1123"}, {"name"=>"BillingCountry", "value"=>"USA"}'

reasons <- gsub("=>", ":", reasons)
reasons <- gsub("[{}]", "", reasons)
reasons <- paste0("{",reasons,"}")

fromJSON(reasons)
$`name`
[1] "jorge"

$value
[1] "MX"

$name
[1] "Billing phone number"

$value
[1] "1123"

$name
[1] "BillingCountry"

$value
[1] "USA"
Hack-R
  • 22,422
  • 14
  • 75
  • 131
  • Thanks. I wrote the following code as you have mentioned: library(jsonlite) data<-read.csv("/Users/susmitaghosh/Downloads/sample_sift_reason.csv") dim(data) reasons<-data$sift_reasons reasons <- gsub("=>", ":", reasons) reasons <- gsub("[{}]", "", reasons) reasons <- paste0("{",reasons,"}") fromJSON(reasons). But I am still getting following error: Error: parse error: invalid object key (must be a string) {["name":"BillingAddress_Country (right here) ------^. Sorry for bad formatting. I don't know how to fix it in comment. – user3642360 Jul 02 '18 at 21:23
  • @user3642360 if it doesn't fit neatly in a comment, you can edit your original post and put additional code formatted in the question – camille Jul 02 '18 at 21:45
  • @user3642360 I agree with camille's suggestion but also it sounds like my solution works for the data you provided, but not some other part of the data that we don't have. So we need the that data too. Further, since what you thought was JSON was JSON-like but not real JSON, we should look into where you got this data from. A better solution is to find out what format it's in and convert it directly from that instead of making into JSON. – Hack-R Jul 02 '18 at 21:46
  • 1
    @Hack-R. Thank you so much. Your code is working. I did the modification like this: reasons<-as.character(data$reasons) – user3642360 Jul 02 '18 at 21:51
  • @user3642360 Perfect! If this works for you could you please click the green checkmark to mark it as the answer? – Hack-R Jul 02 '18 at 21:52
1
  dat%>%
  mutate(reasons=str_split(gsub("[^=A-Za-z,0-9{} ]+","",reasons),"(?<=\\}),\\s*"))%>%
  unnest()%>%
  mutate(names=str_extract(reasons,"(?<=name=)[^,}]+"),
         values=str_extract(reasons,"(?<=value=)[^,}]+"),
         reasons=NULL)

 reservation                             names    values
1        1592                             jorge        MX
2        1592              Billing phone number      1123
3        1592                    BillingCountry       USA
4        1597             BillingAddressCountry        IN
5        1597 Billing phone number country code        IN
6        1597                     Latest amount 583000000

with this code, if you need the email, just add email=str_extract.. etc etc

Onyambu
  • 67,392
  • 3
  • 24
  • 53
  • Thanks. Your code works for this subset of data. But I am getting error while trying to run the code on the whole data: Error: Duplicate identifiers for rows (333, 334) In addition: Warning messages: 1: Expected 2 pieces. Additional pieces discarded in 49 rows [121, 124, 129, 136, 141, 146, 155, 158, 163, 166, 169, 172, 177, 207, 215, 224, 236, 239, 251, 265, ...]. 2: Expected 2 pieces. Missing pieces filled with `NA` in 57 rows [206, 208, 213, 214, 216, 217, 225, 226, 227, 240, 252, 253, 254, 268, 270, 286, 287, 290, 292, 305, ...]. Can you please help me to identify the error? – user3642360 Jul 02 '18 at 22:30
  • data<-read.csv("sample_reason1.csv") data%>% mutate(reasons=strsplit(gsub("[^=A-Za-z,0-9 ]+","",reasons),"\\s*,\\s*"))%>% unnest()%>% separate(reasons,c("name","value"),sep="\\s*=\\s*")%>% mutate(id=rep(1:(n()/2),each=2))%>% spread(name,value) This is the code. – user3642360 Jul 02 '18 at 22:36
  • Thanks. But I am not getting following error: Error in mutate_impl(.data, dots) : Column `id` must be length 516 (the number of rows) or one, not 56 In addition: Warning messages: 1: Expected 2 pieces. Additional pieces discarded in 49 rows [121, 124, 129, 136, 141, 146, 155, 158, 163, 166, 169, 172, 177, 207, 215, 224, 236, 239, 251, 265, ...]. 2: Expected 2 pieces. Missing pieces filled with `NA` in 57 rows [206, 208, 213, 214, 216, 217, 225, 226, 227, 240, 252, 253, 254, 268, 270, 286, 287, 290, 292, 305, ...]. – user3642360 Jul 02 '18 at 22:45
  • @user3642360 That means that not all the values are present in the rows. give me a second – Onyambu Jul 02 '18 at 22:52
  • I just checked the data carefully. the data contains some other information like "email", "details", etc apart from "value" and "names" – user3642360 Jul 02 '18 at 23:03
  • okay so that might also influence. Because I was doing a repeat of 2, but if there is more data, then you will have to include it: – Onyambu Jul 02 '18 at 23:09