0

I have two tables that their shared columns do not exactly match (differences in capital character or existence of some characters like comma,space and ...). How can I merge these two tables based on their shared column (in R, Knime, Excel-power query or sql)?

Example Tables and Results

G5W
  • 36,531
  • 10
  • 47
  • 80
Ahmad s
  • 13
  • 3

1 Answers1

2

In your example Result table it's not clear where the row

gene1 | go3 | 14

comes from, because there's no entry for go3 in Table2. I'm assuming that's a mistake and you meant Table2 to include the row

go3 | 14

If that's correct, here's how to do this in KNIME:

KNIME workflow

The two Table Creator nodes just create the two tables with column names as shown in your example - replace these with your actual data sources. Cell Splitter splits column Goes using a comma as the delimiter. The Unpivoting node is configured like this:

Unpivoting config

and the Joiner like this:

Joiner config

All other settings were left as default. Add nodes to reorder and filter the columns in the Joiner output if you need to. Note that you'll see different Goes_Arr[n] columns depending on how many different values of Goes there are - the Enforce exclusion and Enforce inclusion settings make sure that Unpivoting handles this correctly.

This workflow should cope with whitespace between the commas, but I think you also mention differences in capital letters - if you need to handle these, pass each table through a Case Converter node to make them consistent.

Pivoting and unpivoting are hard to understand (IMHO - especially given the cryptic descriptions of their KNIME nodes) but very powerful. I recommend taking time to play around with these nodes to figure out how they work.

nekomatic
  • 5,988
  • 1
  • 20
  • 27