1

I have a csv file with the following values:

# number,array1,array2
0,[1,2,3,4,5],[6,7,8,9,10]

Now I would like to load these two arrays, but when i run:

new_array = np.genfromtxt(fname='file_name.csv',
           skip_header=1,
           defaultfmt=['%i','%i','%i','%i','%i','%i','%i','%i','%i','%i','%i'],
           deletechars='[,]',
           usecols = (1,2,3,4,5),
           dtype=(int),
           delimiter=',',
           comments='# ',)

Then i get an array with values:

[-1  2  3  4 -1]

Instead of:

[1  2  3  4 5]

If I understand correctly, the problem are the brackets, but I expected that

deletechars='[,]'

would do the trick. How do I get genfromtxt to read these values correctly?

  • that's not a valid csv format – hpaulj Mar 08 '23 at 13:18
  • You could strip out the `[]` before hand, and give `genfromtxt` a file with 11 simple columns. You can't tell `genfromtxt` to treat the `[,]` sequence as a differnt kind of delimiter. It's designed to handle a csv, simple .comma separated values'. – hpaulj Mar 08 '23 at 16:04

2 Answers2

1

I think deletchars only affects the column names, rather than their data. I think you need a "converter" to remove the square brackets:

conv = lambda x: int(re.sub(b"[\[\]]", b"", x))

Then you can use:

In [84]: a = np.genfromtxt(fname='file.csv',
    ...:            skip_header=1,
    ...:            defaultfmt=['%i','%i','%i','%i','%i','%i','%i','%i','%i','%i
    ...: ','%i'],
    ...:            usecols = (1,2,3,4,5),
    ...:            dtype=int,
    ...:            delimiter=',',
    ...:            comments='# ',
    ...:            converters={1:conv,2:conv,3:conv,4:conv,5:conv})

In [85]: a
Out[85]: array([1, 2, 3, 4, 5])
Mark Setchell
  • 191,897
  • 31
  • 273
  • 432
1

In your sophisticated case you can load all arrays by regex parsing with numpy.fromregex and numpy.fromstring:

rows = np.fromregex(test_txt, regexp=r'\d+,\[([\d,]+)\],\[([\d,]+)\]', dtype=[('c1', 'O'), ('c2', 'O')])
arr = [np.fromstring(c, sep=',', dtype=np.int32) for row in rows for c in row]
print(arr)

[array([1, 2, 3, 4, 5], dtype=int32), array([ 6,  7,  8,  9, 10], dtype=int32)]
RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105