0

I am writing code to combine text files and I have encountered an error that I am having trouble solving. Google is no help.

The code is:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Thu Sep 26 09:51:55 2019

@author: comp
"""
import numpy as np

NAME = input("Enter Molecule ID: ")
NAME_IN = NAME+'_apo-1acl.RMSD'

DATA = []
DATA = np.genfromtxt(NAME_IN, skip_header=7, dtype=None, delimiter=' ')

The text file is:

    RMSD TABLE
    __________

_____________________________________________________________________
     |      |      |           |         |                 |
Rank | Sub- | Run  | Binding   | Cluster | Reference       | Grep
     | Rank |      | Energy    | RMSD    | RMSD            | Pattern
_____|______|______|___________|_________|_________________|___________
   1      1      8       -7.23      0.00     93.07           RANKING
   1      2      9       -6.79      1.39     92.64           RANKING
   2      1     16       -7.18      0.00     93.19           RANKING
   3      1      2       -6.93      0.00     93.38           RANKING
   3      2     17       -6.84      0.23     93.45           RANKING
   4      1     15       -6.55      0.00     91.83           RANKING
   4      2      7       -6.34      0.33     91.77           RANKING
   5      1      5       -6.41      0.00     93.05           RANKING
   6      1      3       -6.36      0.00     92.84           RANKING
   6      2     10       -6.28      0.47     92.92           RANKING
   6      3      6       -6.27      0.43     92.82           RANKING
   6      4     18       -6.25      0.32     92.88           RANKING
   6      5     13       -6.24      0.96     92.75           RANKING
   6      6      1       -6.24      0.87     92.60           RANKING
   6      7     14       -6.21      0.51     92.90           RANKING
   6      8     11       -6.14      0.98     92.78           RANKING
   6      9     20       -6.11      0.71     92.67           RANKING
   6     10     19       -6.01      1.36     93.00           RANKING
   7      1     12       -6.30      0.00     93.28           RANKING
   8      1      4       -5.85      0.00     92.97           RANKING
_______________________________________________________________________

and the error is:

Traceback (most recent call last):

  File "/home/comp/Apps/Models/1-PhosphorusLigands/CombinedLigands/MOL/Docking/Results/RMSDTable/CombineRMSDFiles.py", line 14, in <module>
    DATA = np.genfromtxt(NAME_IN, skip_header=7, dtype=None, delimiter=' ')

  File "/home/comp/Apps/Miniconda3/lib/python3.7/site-packages/numpy/lib/npyio.py", line 2075, in genfromtxt
    raise ValueError(errmsg)

ValueError: Some errors were detected !
    Line #9 (got 42 columns instead of 1)
    Line #10 (got 42 columns instead of 1)
    Line #11 (got 41 columns instead of 1)
    Line #12 (got 42 columns instead of 1)
    Line #13 (got 41 columns instead of 1)
    Line #14 (got 41 columns instead of 1)
    Line #15 (got 42 columns instead of 1)
    Line #16 (got 42 columns instead of 1)
    Line #17 (got 42 columns instead of 1)
    Line #18 (got 41 columns instead of 1)
    Line #19 (got 42 columns instead of 1)
    Line #20 (got 41 columns instead of 1)
    Line #21 (got 41 columns instead of 1)
    Line #22 (got 42 columns instead of 1)
    Line #23 (got 41 columns instead of 1)
    Line #24 (got 41 columns instead of 1)
    Line #25 (got 41 columns instead of 1)
    Line #26 (got 40 columns instead of 1)
    Line #27 (got 41 columns instead of 1)
    Line #28 (got 42 columns instead of 1)

At this point, I'm not even sure how to ask the question.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
Steve
  • 153
  • 2
  • 13
  • It is hard to tell for sure without having sample data that is aligned exactly the same way as in your data file. Apparently numpy expects only one column per row, whereas the delimiter `' '` indicates 42. Maybe you can either 1) reformat your posted sample data (preferred), 2) post the file contents on pastebin 3) upload the file somewhere and share the link – WolfgangK Sep 26 '19 at 16:40
  • For a start I'd suggest dropping the `delimiter` parameter; the default is 'white space' which is more general than the one you provide. – hpaulj Sep 26 '19 at 17:01
  • Evidently the first line after the skip is not being split by the provided delimiter. So `genfromtxt` expects all lines to have the same number of columns - 1. But subsequent lines have 41 or 42 columns (as defined by the delimiter). `genfromtxt` expects all lines to have the same number of columns. Examine the file more carefully. – hpaulj Sep 26 '19 at 17:04
  • If I count right, you should be skipping 8 header lines. If you get the skip_header and delimiter right, you should get a 1d structured array with 7 fields, 3 integer, 3 float, and one string. – hpaulj Sep 26 '19 at 17:08

1 Answers1

0

With a copy-n-paste of your file sample:

In [89]: np.genfromtxt(txt.splitlines(), dtype=None, encoding=None, skip_header=
    ...: 8)                                                                     
Out[89]: 
array([(1,  1,  8, -7.23, 0.  , 93.07, 'RANKING'),
       (1,  2,  9, -6.79, 1.39, 92.64, 'RANKING'),
       (2,  1, 16, -7.18, 0.  , 93.19, 'RANKING'),
       (3,  1,  2, -6.93, 0.  , 93.38, 'RANKING'),
       (3,  2, 17, -6.84, 0.23, 93.45, 'RANKING'),
       (4,  1, 15, -6.55, 0.  , 91.83, 'RANKING'),
       (4,  2,  7, -6.34, 0.33, 91.77, 'RANKING'),
       (5,  1,  5, -6.41, 0.  , 93.05, 'RANKING'),
       (6,  1,  3, -6.36, 0.  , 92.84, 'RANKING'),
       (6,  2, 10, -6.28, 0.47, 92.92, 'RANKING'),
       (6,  3,  6, -6.27, 0.43, 92.82, 'RANKING'),
       (6,  4, 18, -6.25, 0.32, 92.88, 'RANKING'),
       (6,  5, 13, -6.24, 0.96, 92.75, 'RANKING'),
       (6,  6,  1, -6.24, 0.87, 92.6 , 'RANKING'),
       (6,  7, 14, -6.21, 0.51, 92.9 , 'RANKING'),
       (6,  8, 11, -6.14, 0.98, 92.78, 'RANKING'),
       (6,  9, 20, -6.11, 0.71, 92.67, 'RANKING'),
       (6, 10, 19, -6.01, 1.36, 93.  , 'RANKING'),
       (7,  1, 12, -6.3 , 0.  , 93.28, 'RANKING'),
       (8,  1,  4, -5.85, 0.  , 92.97, 'RANKING')],
      dtype=[('f0', '<i8'), ('f1', '<i8'), ('f2', '<i8'), ('f3', '<f8'), ('f4', '<f8'), ('f5', '<f8'), ('f6', '<U7')])
hpaulj
  • 221,503
  • 14
  • 230
  • 353