What is more common: X_train or x_train?

Question

What is more common: X_train or x_train? In keras documentation I see 'x_train', while sklearn documentation usually contains 'X_train'. Is there any standard on the notation?

"What is more common" is almost an entirely subjective question. "Is there a standard" is an objective one (to which the answer is evidently "no", as you've shown in your examples). — Itamar Mushkin, Jan 01 '20 at 12:44

score 1 · Answer 1 · answered Sep 22 '19 at 17:23

1

x_train for variable names (lowercase and snakecase)

answered Sep 22 '19 at 17:23

David Lor

78
1
5

score 0 · Answer 2 · answered Sep 22 '19 at 17:50

0

If you go into statistics behind it usually the hypothesis equation used is Y = f(x), where Y is the output and f(x) is the function of all variables used in the equation

Hence x_train, Y_train . But as you said there is a difference in libraries and there is not compulsion to use one or the other.

answered Sep 22 '19 at 17:50

Shahir Ansari

1,682
15
21

Is there any common example that uses lowercase x and uppercase Y? – Itamar Mushkin Jan 01 '20 at 12:46

blackraven · Answer 3 · 2022-09-06T12:29:45.820

The question should not be about "more common", but what is represented in X_train. It is a capital letter X to represent a 2-D matrix.

Mathematically, it is a common notation for Linear Algebra to use uppercase Latin letters for matrices (e.g. matrix X) and lowercase Latin letters for vectors (vector y).

In data science, the feature matrix X is a collection of many columns of feature values. For example a df with 1 target, 20 features and 1000 data records will have the shape of shape (1000, 21). So we will define the feature matrix X to have the shape (1000, 20). Whereas the target label y is a column of values having the shape (1000, 1).

After applying train_test_split() on X and y with test_size=0.25, I would expect:
X_train to be a 2-D matrix (750, 20)
y_train to be a 1-D vector (750, 1)

What is more common: X_train or x_train?

3 Answers3