I am fairly new to python and want to get non-contiguous columns in pandas, but can seem to figure it out. I know in R it could be done like df[:, c(1, 3:)]
to select columns 1, 3 to end of columns when using indexing. Just want to know how that is done in python using a general approach that would be applicable to different datasets with differing number of columns
Say I have generate some data like below:
## generate integer and category/hierarchy data
dataset = pd.DataFrame({'Group': np.random.choice(range(1, 5), 100, replace=True),
"y": np.random.choice(range(1, 6), 100, replace=True),
"X1": np.random.choice(range(1, 6), 100, replace=True),
"X2": np.random.choice(range(1, 6), 100, replace=True),
"X3": np.random.choice(range(1, 6), 100, replace=True),
"X4": np.random.choice(range(1, 6), 100, replace=True),
"X5": np.random.choice(range(1, 6), 100, replace=True)
})
dataset.head()
I know I can select columns 0 and 1 (Group and y) with dataset.iloc[:, np.r_[0,1]]
, and I can also select columns Group, X1 through X5
with dataset.iloc[:, np.r_[0, 2:7]]
.
Group X1 X2 X3 X4 X5
0 2 3.000000 4.000000 5.000000 4.0 2.0
1 2 4.000000 2.000000 2.000000 5.0 3.0
2 1 5.000000 1.000000 3.000000 5.0 1.0
3 4 5.000000 2.986855 2.000000 3.0 4.0
4 1 1.000000 3.000000 5.000000 4.0 1.0
... ... ... ... ... ... ...
95 1 3.000000 3.000000 2.000000 5.0 3.0
96 4 2.964054 4.000000 5.000000 1.0 5.0
97 2 4.000000 3.000000 2.863587 2.0 5.0
98 1 3.000000 3.000000 4.000000 3.0 2.0
99 4 5.000000 2.692210 3.000000 3.0 1.0
My question is, is there a more general way to select columns 2:
to the last column using the np.r_
function, like can be done in R df[:, c(1, 3:)]
.