-1
tuple_columns = ('distance', 'speed', 'momentum', 'name', 'friend')
tuple_input = [
(3, 4, 6, 'er', 'ere'),
(3, 4, 6, 'er', 'ere'),
(3, 4, 6, 'er', 'ere'),
(3, 4, 6, 'er', 'ere'),
]

What's the best way to create numpy arrays that are vertical columns from this dataset which is initially a list of horizontal tuples?

v1z3
  • 137
  • 2
  • 9

2 Answers2

0

NumPy array does not allow array with different types, so you cannot directly have an array with both int and str, there is a workaround:

new_arr = np.array(tuple_input, dtype=object)

it gives the following array

[[3 4 6 'er' 'ere']
[3 4 6 'er' 'ere']  
[3 4 6 'er' 'ere']  
[3 4 6 'er' 'ere']]

also, or if you want each tuple to be the column of your new array, you can do:

new_arr = np.array(tuple_input, dtype=object).T

which results in the following array.

[[3 3 3 3]
 [4 4 4 4]
 [6 6 6 6]
 ['er' 'er' 'er' 'er']
 ['ere' 'ere' 'ere' 'ere']]

However, I do want to warn you that, I personally think having an array of type object is not good. It might have potential problems when you perform certain operations. for example, according to a comment by Astrid from https://stackoverflow.com/a/44058285/14436930

Suppose, for argument's sake, that you turned that into a dataframe. And then you wanted to filter objects in that dataframe say df.loc[(df.col == item)] well that would not work because when pandas does the filtering it expects all the items to be of the same type. So if, for example, you were to mix strings and integers in the same column then you would be comparing apples and oranges effectively. And hence pandas would throw an error

And even if it does not cause problems in your case, forcing int and str to be only objects is not a very good programming habit

seermer
  • 571
  • 6
  • 12
0

You can create a DataFrame directly like this:

df = pd.DataFrame(tuple_input)
df.columns = tuple_columns

And if you want a numpy array matrix, then you can use:

df_array = df.values

Or

df_array = df.to_numpy()

But the above method won't be able to retain your header values, to retain the header you can use a record array which is an ndarray subclass that allows field access using attributes. Likewise:

df_records = df.to_records(index = False)
Shivam Roy
  • 1,961
  • 3
  • 10
  • 23