0

I have a data file that can be downloaded from here: https://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data

I want to define a function that reads and loads the data and returns dataset numpy arrays. Dataset should have 14 columns corresponding to the 13 attributes of housing property x and housing price value y.

def loadData(filename):
  dataset = None
  file = open(filename, "r")
  data = file.read()
  print(data)
  x = np.genfromtxt(filename, usecols = [0,1,2,3,4,5,6,7,8,9,10,11,12])
  y = np.genfromtxt(filename, usecols = 13)
  print("x: ", x)
  print("y: ", y)
  dataset = np.concatenate((x,y), axis = 1)

  return dataset

My y output seems to be alright. However, my x output is wrong as seen below:

enter image description here

Part of the output of x should contain the values below, as part of an np array:

enter image description here

What am I doing wrong?

edit: the above question has been answered and resolved. However, I just wanted to ask how would I ensure that the output is in float64.

My output is enter image description here

but my expected is enter image description here

I have edited the np.genfromtxt line to have type = np.float64 as shown:

  x = np.genfromtxt(filename, usecols = [0,1,2,3,4,5,6,7,8,9,10,11,12], dtype = np.float64)
  y = np.genfromtxt(filename, usecols = 13, dtype = np.float64)

I have also tried dataset.astype(float64) but neither has worked. Would appreciate some help again. Thank you!

user19825372
  • 105
  • 4

2 Answers2

0

You have already read the data from file in data variable. Use data variable instead of filename in genfromtxt() as below instead of filename:

def loadData(filename):
  dataset = None
  file = open(filename, "r")
  data = file.read()
  print(data)
  x = np.genfromtxt(data, usecols = [0,1,2,3,4,5,6,7,8,9,10,11,12])
  y = np.genfromtxt(data, usecols = 13)
  print("x: ", x)
  print("y: ", y)
  dataset = np.concatenate((x,y), axis = 1)

  return dataset
buddemat
  • 4,552
  • 14
  • 29
  • 49
Mage011
  • 1
  • 2
0

your code is almost correct. The problem there is that after loading x you got an array x of shape (506, 13) (two-dimensional) and an array y with shape (506,) (one-dimensional). So, after loading y you have to add a new dimension to convert it to two-dimensional. Numpy offers the np.newaxis method for that. The code that solves your problem is:


import numpy as np

def loadData(filename):
  x = np.genfromtxt(filename, usecols = [0,1,2,3,4,5,6,7,8,9,10,11,12])
  y = np.genfromtxt(filename, usecols = 13)
  y = y[:, np.newaxis].astype(np.float64) # Add new axis and convert to float64
  dataset = np.concatenate((x,y), axis = 1)

  return dataset


if __name__ == "__main__":
    dataset = loadData("housing.data")


    """
    print(type(dataset[0, 0]))
    >>> <class 'numpy.float64'>
    """

Hope it helps!

Cuartero
  • 407
  • 1
  • 6
  • Hi, this worked like a charm! However, I realised that I would need to set my output as float64 but I am currently unable to do so as for some reason, the methods I used don't seem to work. I have edited the question to show what I mean as well as the methods used. If you would be so kind as to help me once more, I would greatly appreciate it – user19825372 Sep 21 '22 at 09:51
  • Hello again! I have updated my answer! – Cuartero Sep 22 '22 at 08:54
  • Hi, I tried it. But it didn't work :( – user19825372 Sep 22 '22 at 15:35
  • What is the issue exactly? – Cuartero Sep 22 '22 at 15:51
  • I'm not sure. My output still remained as what is shown in the pic. Basically nothing has changed. – user19825372 Sep 22 '22 at 16:50
  • Have you checked if the `type(...)` of one element inside your array is not `np.float64`. Maybe, it is just a printing issue. You can use my code and run the block `print(type(dataset[0, 0]))` – Cuartero Sep 23 '22 at 11:01