-1

What is more common: X_train or x_train? In keras documentation I see 'x_train', while sklearn documentation usually contains 'X_train'. Is there any standard on the notation?

sokolov0
  • 103
  • 5
  • "What is more common" is almost an entirely subjective question. "Is there a standard" is an objective one (to which the answer is evidently "no", as you've shown in your examples). – Itamar Mushkin Jan 01 '20 at 12:44

3 Answers3

1

x_train for variable names (lowercase and snakecase)

David Lor
  • 78
  • 1
  • 5
0

If you go into statistics behind it usually the hypothesis equation used is Y = f(x), where Y is the output and f(x) is the function of all variables used in the equation

Hence x_train, Y_train . But as you said there is a difference in libraries and there is not compulsion to use one or the other.

Shahir Ansari
  • 1,682
  • 15
  • 21
0

The question should not be about "more common", but what is represented in X_train. It is a capital letter X to represent a 2-D matrix.

Mathematically, it is a common notation for Linear Algebra to use uppercase Latin letters for matrices (e.g. matrix X) and lowercase Latin letters for vectors (vector y).

In data science, the feature matrix X is a collection of many columns of feature values. For example a df with 1 target, 20 features and 1000 data records will have the shape of shape (1000, 21). So we will define the feature matrix X to have the shape (1000, 20). Whereas the target label y is a column of values having the shape (1000, 1).

After applying train_test_split() on X and y with test_size=0.25, I would expect:
X_train to be a 2-D matrix (750, 20)
y_train to be a 1-D vector (750, 1)

blackraven
  • 5,284
  • 7
  • 19
  • 45