-1

I do have a Series that consists of fixed-sized vectors, but as str. How can I change this series' type to a numerical vector?

Here is the preview of this serie:

The preview of the serie

p.s. The provided answers in a similar question did not help.

talha06
  • 6,206
  • 21
  • 92
  • 147
  • 1
    Does this answer your question? [Change column type in pandas](https://stackoverflow.com/questions/15891038/change-column-type-in-pandas) – David Siret Marqués Jun 27 '23 at 13:10
  • 1
    Unfortunately, no. Getting the following error when I try to convert the type of serie through `df["gen_vec"] = pd.to_numeric(df["gen_vec"])` as posted in that question: `ValueError: Unable to parse string "[[9.456396219320595e-05, 0.003077319823205471, -0.006812645122408867, ...` – talha06 Jun 27 '23 at 13:15
  • 1
    gen_vec seems to be a string, try doing `list(gen_vec)`, that might turn the variable gen_vec to a list and then you can feed it to `pd.series()` – David Siret Marqués Jun 27 '23 at 13:18
  • Unfortunately, still the same. Here is what I've tried per your comment: `df["gen_vec"] = pd.Series(list(df['gen_vec']))` – talha06 Jun 27 '23 at 13:26
  • 1
    Can you share a bit of code to reproduce the error? – David Siret Marqués Jun 27 '23 at 13:26
  • No errors, David, just the result is the same - I still get String objects in the serie. Many thanks for your care though. – talha06 Jun 27 '23 at 13:28
  • 1
    Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/254270/discussion-between-david-siret-marques-and-talha06). – David Siret Marqués Jun 27 '23 at 13:28
  • How did you get this string? by loading a csv? – hpaulj Jun 27 '23 at 14:16

1 Answers1

1

As you have nan values, you can use pd.eval:

out = gen_vec.apply(pd.eval, local_dict={'nan': np.nan})

Use literal_eval from ast module:

import ast:

out = gen_vec.apply(ast.literal_eval)

Output:

>>> out
0    [[0.6304918890918207, -0.5886238157645294, -0....
1    [[-0.6302182776914216, 0.9368165801475401, 0.7...
2    [[0.6153572001094536, -0.07547153598238743, -0...
3    [[0.1583211249108949, -0.07501481771633367, -0...
4    [[0.9793698091130785, 0.6140448218764745, -0.9...
dtype: object

>>> out.loc[0]
[[0.6304918890918207, -0.5886238157645294, -0.3194771085022785],
 [-0.7222439829639373, 0.682891259912199, -0.9084527274979692],
 [0.9372246370318329, -0.8042811128682565, -0.39435908071826065]]

>>> type(out.loc[0])
list

Input example:

data = ['[[0.6304918890918207, -0.5886238157645294, -0.3194771085022785], [-0.7222439829639373, 0.682891259912199, -0.9084527274979692], [0.9372246370318329, -0.8042811128682565, -0.39435908071826065]]',
        '[[-0.6302182776914216, 0.9368165801475401, 0.7293141762489015], [-0.10363402231002539, 0.22356716941880794, 0.6796536411142267], [0.739412959837795, 0.3434906849876964, 0.6840523183724572]]',
        '[[0.6153572001094536, -0.07547153598238743, -0.3147739134079086], [-0.4517142976978141, -0.7661353319665889, -0.08218569081022897], [0.21828238409073308, -0.8458822924041092, -0.8100486062713181]]',
        '[[0.1583211249108949, -0.07501481771633367, -0.8430782622316249], [0.11189737816973255, -0.890710343331605, 0.2881597201674384], [-0.8188156405874802, -0.16829948165814113, -0.9222470203602522]]',
        '[[0.9793698091130785, 0.6140448218764745, -0.9485282042022696], [0.7188762127494397, 0.042247790689530884, -0.5645509356734524], [-0.26842956038325627, -0.993030492245303, -0.8585439320376391]]']

gen_vec = pd.Series(data)
Corralien
  • 109,409
  • 8
  • 28
  • 52
  • Yes, I saw that operation from another post but this time getting this error due to `nan` values: `ValueError: malformed node or string: nan`. – talha06 Jun 27 '23 at 13:36