1

Is there a way to transform rows in Datatable to columns in python?

For example- Given there is a datatable like below

A 2
B 3
C 5

I want to transform it to

A B C
2 3 5

and merge it with another datatable that looks like

A X Y Z
2 5 0 3

So that the end result will be

A X Y Z B C
2 5 0 3 3 5

I want to use python datatable and not pandas/dataframe.

martineau
  • 119,623
  • 25
  • 170
  • 301
l a s
  • 3,836
  • 10
  • 42
  • 61
  • df.T to transpose rows to columns. Then you can do the merge (not sure what key you’ll merge on. I’ll assume the index) – chitown88 Jan 24 '22 at 20:58
  • At this point, the documentation doesn't indicate any obvious choice such as transpose. You you have the choice to either use numpy or pandas (and transpose) or write a bit of code doing that yourself. – hrokr Jan 24 '22 at 21:56
  • 1
    I think it would be easier to help you if you added some example code with input and desired output. – BCT Jan 24 '22 at 22:00

1 Answers1

0

As commented earlier by @hrokr, there is no implementation of transposition in the datatable module yet.
Also, you are unable to transpose a table comprised of different dtypes (string and int), so I assume it is all strings. So I propose the alternative of using numpy as an intermediate of transposing:

import datatable as dt

a = dt.Frame([["A", "B", "C"], ["2", "3", "5"]])
"""
   | C0     C1
   | str32  str32
-- + -----  -----
 0 | A      2
 1 | B      3
 2 | C      5
[3 rows x 2 columns]
"""
b = dt.Frame([["A", "2"], ["X", "5"], ["Y", "0"], ["Z", "3"]])
"""
   | C0     C1     C2     C3
   | str32  str32  str32  str32
-- + -----  -----  -----  -----
 0 | A      X      Y      Z
 1 | 2      5      0      3
[2 rows x 4 columns]
"""
a1 = a[0].to_numpy().T
a2 = a[1].to_numpy().T
c = dt.rbind(dt.Frame(a1), dt.Frame(a2))
"""
   | C0     C1     C2     C3     C4     C5     C6
   | str32  str32  str32  str32  str32  str32  str32
-- + -----  -----  -----  -----  -----  -----  -----
 0 | A      X      Y      Z      A      B      C
 1 | 2      5      0      3      2      3      5
"""
d = dt.cbind(b,c)
"""
   | C0     C1     C2     C3     C4     C5     C6
   | str32  str32  str32  str32  str32  str32  str32
-- + -----  -----  -----  -----  -----  -----  -----
 0 | A      X      Y      Z      A      B      C
 1 | 2      5      0      3      2      3      5
[2 rows x 7 columns]
"""

I must say it is really not straightforward to remove duplicate columns afterwards...

Gabriel Cretin
  • 365
  • 3
  • 16