0

I want to load data from a text file of the following format:

Sarah, 0.5 0.2 2.0

Where (0.5, 0.2, 2.0) is a vector that describes Sarah. The vector is shortened here. In the actual textfile the vector spans multiple lines.

I have tried :

data = np.genfromtxt(filename, dtype =[("label","U10"),("description","f4",(3,))], delimiter = ",")

However I end up with the following error code: ValueError: could not assign tuple of length 2 to structure with 4 fields.

Ideally what I want is that I could access the vector description like this : data["description"]

ganto
  • 102
  • 7
  • So on the file, there's only one comma delimiter? The numbers are separated only by whitespace? Your `dtype` requires 4 separate columns (even though the resulting array will only have 2 fields. (and genfromtxt won't handle multiple line spans - one line per row of the array). You may need to pass the file through a filter function that removes extra structure. `genfromtxt` expects a simple csv - rows with consistent number of columns with simple delimiter. – hpaulj Apr 26 '19 at 20:25
  • An alternative is to read the file yourself, spliting first on comma, then on space, collecting rows, etc. With the right nesting of list of lists and tuples you can use your `dtype` to make the array. – hpaulj Apr 26 '19 at 20:30

1 Answers1

0

I solved it according to hpaulj's second proposal :

   file=open(filename, "r")
   lines = file.readlines()
   lines = [x.split("\t") for x in lines]
   X = [list(map(float, line[2].split())) for line in lines]
ganto
  • 102
  • 7