3

Given two datatable Frame. How to combine (merge) them in one frame?

dt_f_A =

+--------+--------+--------+-----+--------+
| A_at_1 | A_at_2 | A_at_3 | ... | A_at_m |
+--------+--------+--------+-----+--------+
| v_1    |        |        |     |        |
+--------+--------+--------+-----+--------+
| ...    |        |        |     |        |
+--------+--------+--------+-----+--------+
| v_N    |        |        |     |        |
+--------+--------+--------+-----+--------+

dt_f_B =

+--------+--------+--------+-----+--------+
| B_at_1 | B_at_2 | B_at_3 | ... | B_at_k |
+--------+--------+--------+-----+--------+
| w_1    |        |        |     |        |
+--------+--------+--------+-----+--------+
| ...    |        |        |     |        |
+--------+--------+--------+-----+--------+
| w_N    |        |        |     |        |
+--------+--------+--------+-----+--------+

The expected result (dt_f_A concat(combine or merge) dt_f_B)

+--------+--------+--------+-----+--------+--------+--------+--------+-----+--------+
| A_at_1 | A_at_2 | A_at_3 | ... | A_at_m | B_at_1 | B_at_2 | B_at_3 | ... | B_at_k |
+--------+--------+--------+-----+--------+--------+--------+--------+-----+--------+
| v_1    |        |        |     |        | w_1    |        |        |     |        |
+--------+--------+--------+-----+--------+--------+--------+--------+-----+--------+
| ...    |        |        |     |        | ...    |        |        |     |        |
+--------+--------+--------+-----+--------+--------+--------+--------+-----+--------+
| v_N    |        |        |     |        | w_N    |        |        |     |        |
+--------+--------+--------+-----+--------+--------+--------+--------+-----+--------+

We consider three cases:

Case 1: a) The two frames have exactly the same numbers of rows, and b) unique attributes in the columns.

Case 2: The number of rows is different

Case 3: the attributes are not unique (there is a duplication)

@sammywemmy Thank you for the valuable comment.

ibra
  • 1,164
  • 1
  • 11
  • 26

1 Answers1

1

Case 1: a) The two frames have exactly the same numbers of rows, and b) unique attributes in the columns

1- use cbind : dt_f_A.cbind(dt_f_B)
or
2- use : dt_f_A[:,dt_f_B.names] = dt_f_B

Example :

import datatable as dt

dt_f_A = dt.Frame({"a":[1,2,3,4],"b":['a','b','c','d']})
dt_f_B = dt.Frame({"c":[1.1, 2.2, 3.3, 4.4], "d":['aa', 'bb', 'cc', 'dd']})

dt_f_A.cbind(dt_f_B)

#dt_f_A[:, dt_f_B.names] = dt_f_B # it's work fine also

print(dt_f_A)

Case 2: The number of rows is different

  • dt_f_A.cbind(dt_f_B) gives InvalidOperationError: Cannot cbind frame with X rows to a frame with Y rows. (X ≠ Y)
  • dt_f_A[:, dt_f_B.names] gives ValueError: Frame has X rows, and cannot be used in an expression where Y are expected. (X ≠ Y)

The solution : use dt_f_A.cbind(dt_f_B,force=True)

Example:

import datatable as dt

dt_f_A = dt.Frame({"a":[1, 2, 3, 4, 5,6], "b":['a', 'b', 'c', 'd', 'e','f']})
dt_f_B = dt.Frame({"c":[1.1, 2.2, 3.3, 4.4], "d":['aa', 'bb', 'cc', 'dd']})

dt_f_A.cbind(dt_f_B,force=True)

print(dt_f_A)

The missing value, then will be filled with NA

Case 3: the attributes are not unique (there is a duplication)

  • dt_f_A.cbind(dt_f_B) : It works and gives a warning. It changes the duplicated attribute to a unique attribute: atatableWarning: Duplicate column name found, and was assigned a unique name: 'a' -> 'a.0'

  • dt_f_A[:, dt_f_B.names] = dt_f_B : IT doesn't give any error. It eliminate the duplicated column in dt_f_A and keep the column in dt_f_B.

Example:

import datatable as dt

dt_f_A = dt.Frame({"a":[1,2,3,4],"b":['a','b','c','d']})
dt_f_B = dt.Frame({"a":[1.1, 2.2, 3.3, 4.4], "d":['aa', 'bb', 'cc', 'dd']})

dt_f_A.cbind(dt_f_B) # rename the duplicated columns
#dt_f_A[:, dt_f_B.names] = dt_f_B # keep only the duplicated columns in dt_f_B

print(dt_f_A)

@sammywemmy Thank you for your valuable comment :)

ibra
  • 1,164
  • 1
  • 11
  • 26
  • note that for this to work, they should have same number of rows, and unique values in the columns; the assumption here is that you are certain of the number of rows and the uniqueness of the values in each column (there are no repeats) – sammywemmy Aug 18 '20 at 23:23
  • @sammywemmy , thank you very much for your valuable comment. I changed the question and the answer as well to cover the three cases. Also, I thanked you in the main answer and question. Thank you again. – ibra Aug 20 '20 at 13:07