0

I'm using API to get data from the Census Bureau. The good news is that I'm able to retrieve the data. The bad news is that I can't get it into a format that is usable for analysis and mapping.

My question: Is there a way to modify the API call or a standard way of dealing with missing values when the data is in a list?

Here's what I'm doing with the actual data. A toy example is below because the census data requires a personal API token.

# Pull data from Census Bureau
mydata<-fromJSON(file=url(paste("http://api.census.gov/data/2010/acs5?key=", token,"&get=B25077_001E&for=block+group:*&in=state:47+county:037", sep = ""))) 
# create a data frame
mydata.df<-ldply(mydata)
# rename columns 
names(mydata.df)<-ldply(mydata)[1,] 

Here's some of my data. I've tried mydata[mydata == NULL] = 9999 but it didn't help.

   list(c("94400", "47", "037", "019200", "4"), c("350000", "47", "037", "019300", "1"), list(NULL, "47", "037", "019300", "2"), list(NULL, "47", "037", "019300", "3"), c("198200", "47", "037", "019400", "1"), c("176900", "47", "037", "019400", "2"), c("250000", "47", "037", "019400", "3"), c("166200", "47", "037", "019500", "1"), c("227200", "47", "037", "019500", "2"), c("210500", "47", "037", "019500", "3"), c("187500", "47", "037", "019500", "4"), c("140000", "47", "037", "019600", "1"), c("131300", "47", "037", "019600", "2"), list(NULL, "47", "037", "980100", "1"), list(NULL, "47", "037", "980200", "1"))

This is how I know that there are missing values; some have 5 values, some have 4.

unlist(lapply(mydata, function(x) length(unlist(x))))

In the event that this isn't an issue with fromJSON(), here's an example of what I'd like the data to do once it's in R.

mylist = list(a = c(1:4), b = c(1:3), c = c(1:4), d = )

Gives this:

$a
[1] 1 2 3 4
$b
[1] 1 2 3
$c 
[1] 1 2 3 4

But I would like this:

$a
[1] 1 2 3 4
$b
[1] 1 2 3 NA
$c 
[1] 1 2 3 4

Or something similar where an NA acts as a placeholder for missing values. If a 2 were missing, for example, the entry in the list would like like 1 NA 3 4.

Nancy
  • 3,989
  • 5
  • 31
  • 49
  • Could you show a representative example. thanks. Also, how do we know which index is missing if you only have the value and length of each list element are not the same? – akrun Aug 05 '14 at 16:15
  • For some reason, the `mydata <- fromJSON(....)` gives an error message. Can you `dput()` a subset of `mydata` if it is a list? – akrun Aug 05 '14 at 16:26
  • The error is because you have to request an API token from the Census Bureau. I can't share mine. I've included example data above. Thank you so much for helping! – Nancy Aug 05 '14 at 16:34
  • if mylist is the dput data, `lapply(mylist, function(x) do.call(c,lapply(x, function(y) {y[is.null(y)] <- NA;y})))` convert the NULL to NA. Some of the list elements were also a list because of this. How do you want to rearrange from this output? – akrun Aug 05 '14 at 16:40

1 Answers1

0
mylist = list(a = 1:4, b = 1:3, c = c(1,3,4))
Un <- unique(unlist(mylist))
lapply(mylist, function(x) x[match(Un,x)])
# $a
# [1] 1 2 3 4

# $b
# [1]  1  2  3 NA

# $c
#[1]  1 NA  3  4

Update

Using the dput() data

 lst1 <- lapply(mylist, function(x) do.call(c,lapply(x, 
                      function(y) {y[is.null(y)] <- NA;y}))) 

   head(lst1,3)
  #[[1]]
  #[1] "94400"  "47"     "037"    "019200" "4"     

  #[[2]]
  #[1] "350000" "47"     "037"    "019300" "1"     

  #[[3]]
  #[1] NA       "47"     "037"    "019300" "2"     
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Is there any way to write this into the API call? – Nancy Aug 05 '14 at 15:54
  • @Nancy. Did you meant to `fromJSON`. I am not that familiar with `RJSONIO` – akrun Aug 05 '14 at 16:05
  • Also, now that I look at it maybe my example wasn't representative of the actual data I'm processing. The real data isn't the same row line repeated over and over; the problem is the missing INDEX not the missing VALUE. Ie, the third value in the row is missing, but it shouldn't be the same as the third value on the row above it. – Nancy Aug 05 '14 at 16:09
  • Yes @akrun I meant fromJSON. If you know another way that's fine, too. I'm not very far into the project. – Nancy Aug 05 '14 at 16:11
  • @Nancy, Could you show a representative example. thanks. Also, how do we know which index is missing if you only have the value and length of each list element are not the same? – akrun Aug 05 '14 at 16:12
  • okay @akrun I added some of my data in the original post. I'm used to cleaning data frame data but not list data. When I used dput() to get the example data I learned that there actually are NULL values already present. How can I make those NA or some other dummy value? – Nancy Aug 05 '14 at 16:32