2

I'm currently doing a course on Coursera (Machine Leraning) offered by University of Washington and I'm facing little problem with the numpy and graphlab

The course requests to use a version of graphlab higher than 1.7 Mine is higher as you can see below, however, when I run the script below, I got an error as follows:

  [INFO] graphlab.cython.cy_server: GraphLab Create v2.1 started.
  def get_numpy_data(data_sframe, features, output):
      data_sframe['constant'] = 1
      features = ['constant'] + features # this is how you combine two lists
      # the following line will convert the features_SFrame into a numpy matrix:
      feature_matrix = features_sframe.to_numpy()
      # assign the column of data_sframe associated with the output to the SArray output_sarray

      # the following will convert the SArray into a numpy array by first converting it to a list
      output_array = output_sarray.to_numpy()
      return(feature_matrix, output_array)

     (example_features, example_output) = get_numpy_data(sales,['sqft_living'], 'price') # the [] around 'sqft_living' makes it a list
     print example_features[0,:] # this accesses the first row of the data the ':' indicates 'all columns'
     print example_output[0] # and the corresponding output

     ----> 8     feature_matrix = features_sframe.to_numpy()
     NameError: global name 'features_sframe' is not defined

The script above was written by the course authors, so I believe there is something I'm doing wrong

Any help will be highly appreciated.

Sachith Wickramaarachchi
  • 5,546
  • 6
  • 39
  • 68
  • Your parameter is called `data_sframe` but you are trying to coerce `features_sframe` to a `numpy` matrix. Where does `features_sframe` come from? Might that be the issue here? – Abdou Nov 05 '16 at 15:08
  • Thank your reply @Abdou , however, I don´t think that is a problem, because I saw other guys work posted online and this script has worked perfectly. I believe should be something related with the version of my `graphlab` or my `numpy`. Features (`sqft_living`) is one column from my data frame called `sales` – Rafael Rodrigues Santos Nov 05 '16 at 15:16

2 Answers2

2

You are supposed to complete the function get_numpy_data before running it, that's why you are getting an error. Follow the instructions in the original function, which actually are:

def get_numpy_data(data_sframe, features, output):
    data_sframe['constant'] = 1 # this is how you add a constant column to an SFrame
    # add the column 'constant' to the front of the features list so that we can extract it along with the others:
    features = ['constant'] + features # this is how you combine two lists
    # select the columns of data_SFrame given by the features list into the SFrame features_sframe (now including constant):

    # the following line will convert the features_SFrame into a numpy matrix:
    feature_matrix = features_sframe.to_numpy()
    # assign the column of data_sframe associated with the output to the SArray output_sarray

    # the following will convert the SArray into a numpy array by first converting it to a list
    output_array = output_sarray.to_numpy()
    return(feature_matrix, output_array)
Yann
  • 567
  • 5
  • 15
0

The graphlab assignment instructions have you convert from graphlab to pandas and then to numpy. You could just skip the the graphlab parts and use pandas directly. (This is explicitly allowed in the homework description.)

First, read in the data files.

import pandas as pd

dtype_dict = {'bathrooms':float, 'waterfront':int, 'sqft_above':int, 'sqft_living15':float, 'grade':int, 'yr_renovated':int, 'price':float, 'bedrooms':float, 'zipcode':str, 'long':float, 'sqft_lot15':float, 'sqft_living':float, 'floors':str, 'condition':int, 'lat':float, 'date':str, 'sqft_basement':int, 'yr_built':int, 'id':str, 'sqft_lot':int, 'view':int}
sales = pd.read_csv('data//kc_house_data.csv', dtype=dtype_dict)
train_data = pd.read_csv('data//kc_house_train_data.csv', dtype=dtype_dict)
test_data = pd.read_csv('data//kc_house_test_data.csv', dtype=dtype_dict)

The convert to numpy function then becomes

def get_numpy_data(df, features, output):
    df['constant'] = 1

    # add the column 'constant' to the front of the features list so that we can extract it along with the others
    features = ['constant'] + features

    # select the columns of data_SFrame given by the features list into the SFrame features_sframe
    features_df = pd.DataFrame(**FILL IN THE BLANK HERE WITH YOUR CODE**)

    # cast the features_df into a numpy matrix
    feature_matrix = features_df.as_matrix()

    etc.

The remaining code should be the same (since you only work with the numpy versions for the rest of the assignment).

Mark Pedigo
  • 87
  • 1
  • 10