Numpy: would vstack automatically detect an index is out of range and correct it?

Question

I'm puzzled as to why the in the code below (the section where I labeled "HERE"), would work because j+1 would make the list of list (which is the X_train_folds) go out of range when j reaches the end of the range. Why would this even work? Is it because vstack can automatically detect this change? I couldn't find any documentation for it though.

num_folds = 5
k_choices = [1, 3, 5, 8, 10, 12, 15, 20, 50, 100]

X_train_folds = []
y_train_folds = []
################################################################################
# Split up the training data into folds. After splitting, X_train_folds and    #
# y_train_folds should each be lists of length num_folds, where                #
# y_train_folds[i] is the label vector for the points in X_train_folds[i].     #
# Hint: Look up the numpy array_split function.                                #
################################################################################
X_train_folds = np.array_split(X_train, num_folds)
y_train_folds = np.array_split(y_train, num_folds)

# print y_train_folds

# A dictionary holding the accuracies for different values of k that we find
# when running cross-validation. After running cross-validation,
# k_to_accuracies[k] should be a list of length num_folds giving the different
# accuracy values that we found when using that value of k.
k_to_accuracies = {}

################################################################################
# Perform k-fold cross validation to find the best value of k. For each        #
# possible value of k, run the k-nearest-neighbor algorithm num_folds times,   #
# where in each case you use all but one of the folds as training data and the #
# last fold as a validation set. Store the accuracies for all fold and all     #
# values of k in the k_to_accuracies dictionary.                               #
################################################################################

for k in k_choices:
    k_to_accuracies[k] = []

for k in k_choices:
    print 'evaluating k=%d' % k
    for j in range(num_folds):
        X_train_cv = np.vstack(X_train_folds[0:j]+X_train_folds[j+1:])#<--------------HERE
        X_test_cv = X_train_folds[j]

        #print len(y_train_folds), y_train_folds[0].shape

        y_train_cv = np.hstack(y_train_folds[0:j]+y_train_folds[j+1:]) #<----------------HERE
        y_test_cv = y_train_folds[j]

        #print 'Training data shape: ', X_train_cv.shape
        #print 'Training labels shape: ', y_train_cv.shape
        #print 'Test data shape: ', X_test_cv.shape
        #print 'Test labels shape: ', y_test_cv.shape

        classifier.train(X_train_cv, y_train_cv)
        dists_cv = classifier.compute_distances_no_loops(X_test_cv)
        #print 'predicting now'
        y_test_pred = classifier.predict_labels(dists_cv, k)
        num_correct = np.sum(y_test_pred == y_test_cv)
        accuracy = float(num_correct) / num_test

        k_to_accuracies[k].append(accuracy)

################################################################################
#                                 END OF YOUR CODE                             #
################################################################################

# Print out the computed accuracies
for k in sorted(k_to_accuracies):
    for accuracy in k_to_accuracies[k]:
        print 'k = %d, accuracy = %f' % (k, accuracy)

It's because you are using slices. `np.arange(5)[:10]` is ok. — hpaulj, Jun 03 '16 at 18:46

score 1 · Accepted Answer · edited May 23 '17 at 12:07

No. vstack is not causing that, but the very powerful indexation of numpy is. The internals of numpy are complex and sometimes it returns a copy, other times a view. In both cases, however, you are launching methods. And this method in particular returns an empty array when indexation is, itself, empty (as outside the space of the array).

See the following example and the consequential outputs (in print):

import numpy as np

a = np.array([1, 2, 3])
print(a[10:]) # This will return empty
print(a[10]) # This is an error

, the result is:

[]

Traceback (most recent call last): File "C:/Users/imactuallyavegetable/temp.py", line 333, in print(a[10]) IndexError: index 10 is out of bounds for axis 0 with size 3

First an empty array, second the exception.

Numpy: would vstack automatically detect an index is out of range and correct it?

1 Answers1