What is the meaning of below lines., especially confused about how iloc[:,1:] is working ? and also data[:,:1]
data = np.asarray(train_df_mv_norm.iloc[:,1:])
X, Y = data[:,1:],data[:,:1]
Here train_df_mv_norm is a dataframe --
What is the meaning of below lines., especially confused about how iloc[:,1:] is working ? and also data[:,:1]
data = np.asarray(train_df_mv_norm.iloc[:,1:])
X, Y = data[:,1:],data[:,:1]
Here train_df_mv_norm is a dataframe --
Definition: pandas iloc
.iloc[] is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array.
For example:
df.iloc[:3] # slice your object, i.e. first three rows of your dataframe
df.iloc[0:3] # same
df.iloc[0, 1] # index both axis. Select the element from the first row, second column.
df.iloc[:, 0:5] # first five columns of data frame with all rows
So, your dataframe train_df_mv_norm.iloc[:,1:]
will select all rows but your first column will be excluded.
Note that:
df.iloc[:,:1]
select all rows and columns from 0 (included) to 1 (excluded).df.iloc[:,1:]
select all rows and columns, but exclude column 1.To complete the answer by KeyMaker00, I add that data[:,:1]
means:
:
- take all rows.:1
- equal to 0:1
take columns starting from column 0,
up to (excluding) column 1.So, to sum up, the second expression reads only the first column from data.
As your expression has the form:
<variable_list> = <expression_list>
each expression is substituted under the corresponding variable (X and Y).
Maybe it will complete the answers before. You will know
df.iloc[:,1:2] # get column 1 as a DATAFRAME of shape (n, 1)
df.iloc[:,1:2].values # get column 1 as an NDARRAY of shape (n, 1)
df.iloc[:,1].values # get column 1 as an NDARRAY of shape ( n,)
df.iloc[:,1] # get column 1 as a SERIES of shape (n,)
# iloc with the name of a column
df.iloc[:, df.columns.get_loc('my_col')] # maybe there is some more
elegants methods