1

This is a part of a GIT open course that I am taking in my free time to learn python. The exercise deals only with numpy. So, below is creating a filepath and importing the data. I added skip_header because column names are strings and I get Nan. So, the data has 33 columns and I need only 5 which I added using usecols.

import numpy as np
fp = 'C:\\Users\\matij\\Documents\\exercise-5-MatijaKordic\\6153237444115dat.csv'
data = np.genfromtxt(fp, skip_header =1, usecols=(0, 2, 22, 27, 28), delimiter=',')

Next, I need to split the data into separate variables called station, date, temp, temp_max, and temp_min. They correspond to usecols=(0, 2, 22, 27, 28).

station = data[:, 0]
date = data[:, 1]
temp = data[:, 2]
temp_max = data[:, 3]
temp_min = data[:, 4]

After this, I need to calculate the following:

What is the mean Fahrenheit temperature in the data? (the temp variable)

What is the standard deviation of the Maximum temperature? (the temp_max variable)

How many unique stations exists in the data? (the station variable)

So, I did this:

temp_mean = temp.mean()
temp_max_std = temp_max.std()
station_count = np.unique(station)

And I get NaN for mean and max. For unique stations I get [28450. 29980.] so I presume I need to somehow add count within?

As for the mean and max: - Max is Nan so that is fine. Not sure why I have it in the assignment but that is a different story. - Mean however, is the reason of this question. When I print temp, I get values so why do I get NaN for temp.mean?

Here is the link to csv if anyone is interested: https://drive.google.com/file/d/1rGneQTfUe2rq1HAPQ06rvLDxzi-ETgKe/view?usp=sharing

mkw
  • 123
  • 2
  • 10

2 Answers2

4

I agree with the Anubhav's post, however I suggest to use instead: np.nanmean(temp) to compute the mean forgetting the NaN (Not A Number) entries. You will get also the same mean: 41.58918641457781. And same thing with max:

print(np.nanmean(temp))
print(np.nanmax(temp))

Output:

41.58918641457781
65.0
dallonsi
  • 1,299
  • 1
  • 8
  • 29
1

You are getting nan because some of the data in the numpy array is nan. Try this:

temp_mean = temp[~np.isnan(temp)].mean()
print(temp_mean)
temp_max_std = temp_max[~np.isnan(temp_max)].std()
print(temp_max_std)
station_count = np.unique(station)

output:

41.58918641457781
9.734807757434636
array([28450., 29980.])
Anubhav Singh
  • 8,321
  • 4
  • 25
  • 43