-1

I am trying to extract data from NCBI using different functions in rentrez package. However, I have an issue because the function extract_from_esummary() in rentrez results in matrix, where text of a column is splitted into adjacent columns when saved in .csv file ( as shown in Image) because of "," is recognized as a delimiter.

  library (rentrez)
  PM.ID <- c("25979833", "25667274","23792568","22435913")

p.data <- entrez_summary(db = "pubmed", id = PM.ID  )
pubrecord.table <- extract_from_esummary(esummaries = p.data ,
                                         elements = c("uid","title","fulljournalname",
                                                      "pubtype"))

enter image description here

From the image example above, In Column PMID: 25979833, the journal name split to extend into the next column. European journal of cancer (Oxfordin columns 1 and then England : 1990) in next column. When I did a dput(pubrecord.table), I understood that the split is because the words are separated by comma ",". How can I make R understand thatEuropean journal of cancer (Oxford, England : 1990) belongs to the same column ? Similar issue with the Title and Pubtype fields.... where the long text has a comma in between and R breaks it by csv format. How can I clean the data to so that data is in appropriate column ?

zx8754
  • 52,746
  • 12
  • 114
  • 209
user5249203
  • 4,436
  • 1
  • 19
  • 45
  • Without knowing how you `server.R` works is hard to give an opinion.. a `if(is.na(input$PMID)) return(NULL)` before the `sapply` might work. – user5029763 Oct 07 '15 at 18:27

1 Answers1

1

I thought this looked like a bug in extract_from_esummary. I searched package's issues on Github for "comma" and got this, which says:

This is not really a problem with rentrez, just a property of NCBI records and R objects.

In this case, the pubtype field is variably-sized.

When you try and write the matrix it represents the vectors like you'd type them in (c(..., ...)) which adds a comma which breaks the csv format.

In this case, you can collapse the vectors and unlist each matrix row to allow them to be written out

The issue page has code examples as well.

Community
  • 1
  • 1
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • Hi gregor, I raised that issue on github page you referred to. The reason I posted the same Q on SO is to find if there is a generalized solution for this issue. If you notice, the Github example address the pubtype field . However, depending on your PMID the variable fields vary... Title or Journal name or pubtype or something else. So, thought this simple change in code will fix everything `pubrecord.table[,] <- sapply(pubrecord.table[,], paste, collapse=" & ")`. But, I get an error ```Error in apply(pubrecord.reference, 1, unlist) : dim(X) must have a positive length``` – user5249203 Oct 12 '15 at 17:05
  • I'd recommend opening another issue on Github then. – Gregor Thomas Oct 12 '15 at 17:06