1

well, I am working on the GEO data set "GSE166427" based on "GPL13667". I am trying to get this data by the R script below; but in the last line I faced some errors & warnings, however, the expression set does not load in my project. How can I deal with the GEOquery::getGEO to get the expressionset ?!

library(Biobase)  
library(GEOquery)  
library(limma)

series <- "GSE166427"  
platform <- "GPL13667"  
setwd("D:/Proj/DEGs/COAD DEGs/")  
Sys.setenv(VROOM_CONNECTION_SIZE=131072*10)

gset <- getGEO(series, 
                GSEMatrix = TRUE,  
                AnnotGPL = TRUE, 
                destdir = "COAD Data/")

in this step, I am facing:

Found 2 file(s)  
GSE166427-GPL13534_series_matrix.txt.gz  
Using locally cached version: COAD Data//GSE166427-GPL13534_series_matrix.txt.gz

Rows: 0 Columns: 241

- Column specification -----------------------------------------------------------------------------------

Delimiter: "\t"  
chr (241): ID_REF, GSM3759532, GSM3759533, GSM3759534, GSM3759535, GSM3759536, GSM3759537, GSM3759538, ...

i Use `spec()` to retrieve the full column specification for this data.  
i Specify the column types or set `show_col_types = FALSE` to quiet this message.  
Annotation GPL not available, so will use submitter GPL instead Using locally cached version of GPL13534 found here: `COAD Data//GPL13534.soft`

Error in `parseGSEMatrix(fname, destdir = destdir, AnnotGPL = AnnotGPL,  : parsing failed--expected only one '!series_data_table_begin'`
In addition: Warning message:
In `download.file(myurl, destfile, mode = mode, quiet = TRUE, method = getOption("download.file.method.GEOquery"))` :
cannot open URL <https://ftp.ncbi.nlm.nih.gov/geo/platforms/GPL13nnn/GPL13534/annot/GPL13534.annot.gz>: HTTP status was '404 Not Found'\`

so, I change the script to:

gset <- getGEO(series, 
                 filename ="COAD Data/GSE166427-GPL13667_series_matrix.txt.gz", 
                 GSEMatrix = TRUE, 
                 AnnotGPL = TRUE, 
                 destdir = "COAD Data/")

still, the output is:

Rows: 49386 Columns: 247

- Column specification -----------------------------------------------------------------------------------

`Delimiter: "\t"`  
chr   (1): ID_REF  
dbl (246): GSM1077598, GSM1077599, GSM1077600, GSM1077601, GSM1077602, GSM1077603, GSM1077604, GSM10776...

i Use `spec()` to retrieve the full column specification for this data.  
i Specify the column types or set `show_col_types = FALSE` to quiet this message.  
Annotation GPL not available, so will use submitter GPL instead Using locally cached version of GPL13667 found here: `COAD Data//GPL13667.soft`

Error in `.rowNamesDF <- (x, value = value) : invalid 'row.names' length`  
In addition: Warning messages:  
1: In `download.file(myurl, destfile, mode = mode, quiet = TRUE, method = getOption("download.file.method.GEOquery"))` :
cannot open URL <https://ftp.ncbi.nlm.nih.gov/geo/platforms/GPL13nnn/GPL13667/annot/GPL13667.annot.gz> : HTTP status was '404 Not Found'

2: One or more parsing issues, call `problems()` on your data frame for details, e.g.: `dat <- vroom(...)`
problems(dat)
> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

mori
  • 11
  • 2

1 Answers1

0

Probably your directory doesn't exist.

Try this

dir.create("COAD_Data")  ## use this once, then comment out!

gset <- getGEO(series, 
               GSEMatrix = TRUE,  
               AnnotGPL = TRUE, 
               destdir = "COAD_Data")
# Found 2 file(s)
# GSE166427-GPL13534_series_matrix.txt.gz
# Annotation GPL not available, so will use submitter GPL instead
# |--------------------------------------------------|
# |==================================================|
# |--------------------------------------------------|
# |==================================================|
# GSE166427-GPL13667_series_matrix.txt.gz
# Annotation GPL not available, so will use submitter GPL instead

Note, that directory names should not have spaces, so really use "COAD_Data".

dir('COAD_Data')  ## list directory
# [1] "GPL13534.soft.gz"                       
# [2] "GPL13667.soft.gz"                       
# [3] "GSE166427-GPL13534_series_matrix.txt.gz"
# [4] "GSE166427-GPL13667_series_matrix.txt.gz"
jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • Hi jay, thnx for your attention. Actually, the directory is correct and the files are downloaded there, even I can read & load other GSE series saved in "COAD DEGs". However, I try your suggestion and the outcome is the same. – mori Jul 24 '23 at 12:19
  • @mori Works actually fine for me as demonstrated. Please add your OS and `sessionInfo()` as an [edit](https://stackoverflow.com/posts/76750742/edit) to your question. – jay.sf Jul 24 '23 at 12:27