0

I made a python script which takes pdbqt files as input and returns a txt file. As all the lines doesn't have the same no. of columns its not able to read the files. How can I ignore those lines?

sample pdbqt and txt files

the code

from __future__ import division
import numpy as np


def function(filename):

 data = np.genfromtxt(filename, dtype = float , usecols = (6, 7, 8), skip_footer=1)

import os
all_filenames = os.listdir()

import glob
all_filenames = glob.glob('*.pdbqt')

print(all_filenames)

for filename in all_filenames:
    function(filename) 

the error I am getting

Traceback (most recent call last):
  File "cen7.py", line 45, in <module>
    function(filename)
  File "cen7.py", line 7, in function
    data = np.genfromtxt(filename, dtype = float , usecols = (6, 7, 8), skip_footer=1)
  File "/home/../.local/lib/python3.8/site-packages/numpy/lib/npyio.py", line 2261, in genfromtxt
    raise ValueError(errmsg)
ValueError: Some errors were detected !
    Line #3037 (got 4 columns instead of 3)
    Line #6066 (got 4 columns instead of 3)
    Line #9103 (got 4 columns instead of 3)
    Line #12140 (got 4 columns instead of 3)
    Line #15177 (got 4 columns instead of 3)

1 Answers1

0

Let's make a sample csv:

In [75]: txt = """1,2,3,4
    ...: 5,6,7,8,9
    ...: """.splitlines()

This error is to be expected - the number of columns in the 2nd line is larger than previous:

In [76]: np.genfromtxt(txt, delimiter=',')
Traceback (most recent call last):
  Input In [76] in <cell line: 1>
    np.genfromtxt(txt, delimiter=',')
  File /usr/local/lib/python3.8/dist-packages/numpy/lib/npyio.py:2261 in genfromtxt
    raise ValueError(errmsg)
ValueError: Some errors were detected !
    Line #2 (got 5 columns instead of 4)

I can avoid that with usecols. It isn't bothered by the extra columns in line 2:

In [77]: np.genfromtxt(txt, delimiter=',',usecols=(1,2,3))
Out[77]: 
array([[2., 3., 4.],
       [6., 7., 8.]])

But if the line is too short for the usecols, I get an error:

In [78]: np.genfromtxt(txt, delimiter=',',usecols=(2,3,4))
Traceback (most recent call last):
  Input In [78] in <cell line: 1>
    np.genfromtxt(txt, delimiter=',',usecols=(2,3,4))
  File /usr/local/lib/python3.8/dist-packages/numpy/lib/npyio.py:2261 in genfromtxt
    raise ValueError(errmsg)
ValueError: Some errors were detected !
    Line #1 (got 4 columns instead of 3)

The wording of the error isn't quite right, but it is clear which line is the problem.

That should give you something to look for when scanning the problem lines in your csv.

hpaulj
  • 221,503
  • 14
  • 230
  • 353