0

I'm trying to create a dataframe using outputs from the pysam module which is used on genomic data (Bam/Sam files). Pysam.depth() outputs a table as a string. I have used the module StringIO to try to parse the string into a pandas dataframe, however I get the error:

pandas.errors.EmptyDataError: No columns to parse from file

If I open python in the terminal and run the lines of code individually it works.

Here's to show you what the output of pysam.depth() looks like:

>>>depths = pysam.depth('-a', '-r', "Y_unplaced:131349-131401", "file.bam"))
>>>print(depths)
Y_unplaced  131349  2864
Y_unplaced  131350  2861
Y_unplaced  131351  2855
Y_unplaced  131352  2848
Y_unplaced  131353  2842
Y_unplaced  131354  2837
Y_unplaced  131355  2840
...

Here's a bit of my code:

dir = os.environ['PBS_O_WORKDIR']
file_list = open(dir + "/list_of_bams.txt", "r")
for line in file_list:
    sample = line.strip("\n")
    file = dir + "/" + sample.replace("-", "_") + ".bam"
    data1 = StringIO(pysam.depth('-a', '-r', "Y_unplaced:131349-131401", file))
    df1 = pd.read_csv(data1, sep='\t')

I've included some perhaps unecessary surrounding code. I'll be running it on a cluster and I will be making dataframes for all bam files in the "list_of_bams.txt" file.

Here's the error:

File "/rds/general/user/ajf316/ephemeral/bam/AgY53B.py", line 41, in <module>
    df1 = pd.read_csv(data1, sep='\t')
...
pandas.errors.EmptyDataError: No columns to parse from file

I'm not experienced reading errors (or with python in general!) - maybe pysam.depth() is not outputting anything? It's odd because as I mentioned it works fine if I run it in python in the command line like so:

>>> data1 = StringIO(pysam.depth('-a', '-r', "Y_unplaced:131349-131401","AB0117_C.bam"))
>>> df1 = pd.read_csv(data1, sep='\t')
>>> print(df1)
    Y_unplaced  131349  2864
0   Y_unplaced  131350  2861
1   Y_unplaced  131351  2855
2   Y_unplaced  131352  2848

This is the same file as the first the code runs on so there definitely is a possible output. Maybe the "file" object is not right? Although then should the error be on the previous line? Thanks for any help!

Timur Shtatland
  • 12,024
  • 2
  • 30
  • 47
  • 1
    Are you really sure that the filename is correct once you run the script? I can see that you're manually writing the filename `"AB0117_C.bam"` when you're calling it inside of the `python interpreter`. The error you're getting could just be because the script cannot find the file, or is reading an empty file. – Hampus Larsson Aug 15 '19 at 09:57
  • @HampusLarsson I think you're right...so silly! Thank you! – Annie Forster Aug 15 '19 at 10:54

0 Answers0