1

I have the following form of a txt file:

enter image description here

Notice that some of the fields are completely missing, but the fact that they are missing is important. In the attached image all the measurements are missing due to technical failure but it can happen that value in only one of the columns is missing while the others are given.

I am trying to import such .txt file with the following code.

import numpy as np    
data=np.genfromtxt(filepath, skip_header=1, invalid_raise=False, usecols=(2, 3, 4, 5, 6, 7))

Which results in an error:

Line #2123 (got 2 columns instead of 6)

Line #3171 (got 2 columns instead of 6)

Line #3172 (got 2 columns instead of 6)

but still produces some usable result. As I said, the fact that the data at 13:30 is missing is important and can't be simply ignored. However, the above code does exactly that - ignores/skips the row at 13:30. Instead I would like it to fill that row with some predefined value or just denote it in some other way that can be identified later in the processing.

Any way to do that?

Ivan Kolesnikov
  • 1,787
  • 1
  • 29
  • 45
skrat
  • 648
  • 2
  • 10
  • 27
  • Is your txt file written as a tsv (tab-separated values)? If yes, does the line with the missing values contain the correct amount of separators? If yes, you can use pandas to parse it. – Jundiaius Jun 07 '17 at 08:07
  • 1
    @eqperes answer to both questions is YES. – skrat Jun 07 '17 at 08:10

1 Answers1

2

np.genfromtxt() takes the argument missing_values. If you set it to:

data=np.genfromtxt(filepath, skip_header=1, invalid_raise=False, usecols=(2,3, 4, 5, 6, 7), missing_value=???)

it should replace missing values by nans. But notice that there has to be a filler if this should work. Otherwise you may use the usecols argument in that way that you first chose your cols with missing values and seperate them from the main data. Afterwards you could merge them again. A second very good approach of dealing with missing values is the use of pandas.read_csv() instead. Furthermore, its much faster than np.genfromtxt.

Franz
  • 623
  • 8
  • 14