0

I am trying to pull JSON lunar cycle data from the USNO API. The problem is that there are two arrays of JSON data in what I get back. I don't see a way to specify what I get back from the Observatory, so I think I need to clean it up in R. Here is my code:

library(sqldf);
library(jsonlite);

curr_date <- Sys.Date();
Q_date <- format.Date(curr_date, "%m/%d/%Y");
moon_call <- paste0("http://api.usno.navy.mil/moon/phase?date=",Q_date,"&nump=4");

moon_json <- fromJSON(moon_call, simplifyDataFrame =  TRUE);

moon_phases <- do.call("rbind.fill", lapply(moon_json$phasedata, as.data.frame));

The data I get back looks like this:

"","error","apiversion","year","month","day","numphases","datechanged","phasedata.phase","phasedata.date","phasedata.time"
"1",FALSE,"2.1.0",2018,8,29,4,FALSE,"Last Quarter","2018 Sep 03","02:37"
"2",FALSE,"2.1.0",2018,8,29,4,FALSE,"New Moon","2018 Sep 09","18:01"
"3",FALSE,"2.1.0",2018,8,29,4,FALSE,"First Quarter","2018 Sep 16","23:15"
"4",FALSE,"2.1.0",2018,8,29,4,FALSE,"Full Moon","2018 Sep 25","02:52"

When convert it to a data frame I get this:

"","X[[i]]"
"1","Last Quarter"
"2","New Moon"
"3","First Quarter"
"4","Full Moon"
"5","2018 Sep 03"
"6","2018 Sep 09"
"7","2018 Sep 16"
"8","2018 Sep 25"
"9","02:37"
"10","18:01"
"11","23:15"
"12","02:52"

But what I want is a dataframe with the phasedata.phase/.date/.time columns selected:

"","phase","date","time"
"1","Last Quarter","2018 Sep 03","02:37"
"2","New Moon","2018 Sep 09","18:01"
"3","First Quarter","2018 Sep 16","23:15"
"4","Full Moon","2018 Sep 25","02:52"
smci
  • 32,567
  • 20
  • 113
  • 146
TimL
  • 49
  • 5
  • so, does your code work well? in which part you wish to do improvment? – Sal-laS Sep 03 '18 at 03:38
  • 1
    Are you specifically after the `phasedata` - i.e., `moon_json$phasedata` ? I think the line `do.call("rbind.fill", lapply(moon_json, as.data.frame));` is throwing you off course – SymbolixAU Sep 03 '18 at 03:39
  • Yes, I only want the phasedata. Making the change I now only get what I want, but it is one column of data, not three. `"","X[[i]]" "1","Last Quarter" "2","New Moon"` etc – TimL Sep 03 '18 at 03:53
  • 1
    What change have you made? Please edit your question to clarify what changes you've made and what you're now seeing. Also include a value for `curr_date` – SymbolixAU Sep 03 '18 at 03:56
  • I altered the conversion to only get the moon phase data `moon_phases <- do.call("rbind.fill", lapply(moon_json$phasedata, as.data.frame));` `curr_date` is just the system date (`curr_date <- Sys.Date();`), I had to reformat it for the api call. This was run a few days ago and is running for August 29th, 2018. – TimL Sep 03 '18 at 04:01
  • 1
    Please edit your question with your updated code and requirements; don't add them as a comment. – SymbolixAU Sep 03 '18 at 04:04
  • and doesn't `moon_json$phasedata` give you what you want? – SymbolixAU Sep 03 '18 at 04:05
  • Not really, because I get a list and not a table. I need three columns with four rows, not one column with twelve rows. I edited the code to show the latest revision. – TimL Sep 03 '18 at 04:15
  • 2
    `moon_json$phasedata` IS a data.frame of 3 columns, 4 rows. Forget all the `do.call()` stuff. – SymbolixAU Sep 03 '18 at 04:16
  • Ah, that's got it, thanks! `moon_phases <- moon_json$phasedata;` – TimL Sep 03 '18 at 04:18
  • Please be clearer; instead of saying *"The data I get back looks like this"* and *"When convert it to a data frame I get this"*, state which of your lists/dataframes you're referring to; it's unclear if you mean *"moon_call looks like this"* and *"moon_json (/moon_phases) looks like this"*. Anyway I posted an answer below. – smci Sep 03 '18 at 22:42

1 Answers1

0
  1. R allows you to directly extract the three columns from the dataframe moon_json, like you want:

    moon_phases <- moon_json[, c('phasedata.phase', 'phasedata.date', 'phasedata.time')]

(No need whatsoever for the do.call("rbind.fill", lapply(..., as.data.frame)) - that's just a inefficient and tortured way of slicing and then concatenating.)

  1. Then you want to rename your df columns to drop the phasedata. prefix:

    names(moon_phases) <- c('phase', 'date', 'time')

or: names(moon_phases) <- gsub('^phasedata\.', '', names(moon_phases))

  1. Also, you generally don't want explicit rownames 1,2,3... on your dataframes like moon_json, so just do row.names(moon_json) <- NULL or data.frame(..., row.names=NULL) or as.data.frame(..., row.names=NULL)

(jsonlite (or else one of the other R json packages) should have options to do this cleanup and renaming automatically, I don't know, I don't use them much. Check them out and pick a package that makes scraping less painful.)

smci
  • 32,567
  • 20
  • 113
  • 146