I'm trying to create a dataframe using outputs from the pysam module which is used on genomic data (Bam/Sam files). Pysam.depth() outputs a table as a string. I have used the module StringIO to try to parse the string into a pandas dataframe, however I get the error:
pandas.errors.EmptyDataError: No columns to parse from file
If I open python in the terminal and run the lines of code individually it works.
Here's to show you what the output of pysam.depth() looks like:
>>>depths = pysam.depth('-a', '-r', "Y_unplaced:131349-131401", "file.bam"))
>>>print(depths)
Y_unplaced 131349 2864
Y_unplaced 131350 2861
Y_unplaced 131351 2855
Y_unplaced 131352 2848
Y_unplaced 131353 2842
Y_unplaced 131354 2837
Y_unplaced 131355 2840
...
Here's a bit of my code:
dir = os.environ['PBS_O_WORKDIR']
file_list = open(dir + "/list_of_bams.txt", "r")
for line in file_list:
sample = line.strip("\n")
file = dir + "/" + sample.replace("-", "_") + ".bam"
data1 = StringIO(pysam.depth('-a', '-r', "Y_unplaced:131349-131401", file))
df1 = pd.read_csv(data1, sep='\t')
I've included some perhaps unecessary surrounding code. I'll be running it on a cluster and I will be making dataframes for all bam files in the "list_of_bams.txt" file.
Here's the error:
File "/rds/general/user/ajf316/ephemeral/bam/AgY53B.py", line 41, in <module>
df1 = pd.read_csv(data1, sep='\t')
...
pandas.errors.EmptyDataError: No columns to parse from file
I'm not experienced reading errors (or with python in general!) - maybe pysam.depth() is not outputting anything? It's odd because as I mentioned it works fine if I run it in python in the command line like so:
>>> data1 = StringIO(pysam.depth('-a', '-r', "Y_unplaced:131349-131401","AB0117_C.bam"))
>>> df1 = pd.read_csv(data1, sep='\t')
>>> print(df1)
Y_unplaced 131349 2864
0 Y_unplaced 131350 2861
1 Y_unplaced 131351 2855
2 Y_unplaced 131352 2848
This is the same file as the first the code runs on so there definitely is a possible output. Maybe the "file" object is not right? Although then should the error be on the previous line? Thanks for any help!