0

Using Knime, I am trying to remove duplicates in all the rows for set of columns through Groupby node. Can you tell how to implement this or if I can use any other node to get this done. First I have divided my table in set of columns such as set 1 is -->Col1,Col2,Col3,Col4 set 2 is-->Col5,col6,Col7,col8 and like this I have 10 sets(with 4 columns each) now I want to check if there we have same data in any particular set Lets say below values are there in set 1 Col1 has 4 Col2 has 4 Col3 has 4 Col4 has 4

then I will keep Col1 as 4 and values in Col2, col3,col4 will be 'null' .

Can you please tell me how to do this through GroupBy node in KNIME

I have tried this using other nodes like constant Value column Filter, math formula,Rule Engine, but nothing seems to working .

First I have divided my table in set of columns such as set 1 is -->Col1,Col2,Col3,Col4 set 2 is-->Col5,col6,Col7,col8 and like this I have 10 sets(with 4 columns each) now I want to check if there we have same data in any particular set Lets say below values are there in set 1 Col1 has 4 Col2 has 4 Col3 has 4 Col4 has 4

then I will keep Col1 as 4 and values in Col2, col3,col4 will be 'null' .

1 Answers1

0

Can't do it in a GroupBy node. You can get unique values in GroupBy node but you need some logic that will determine that this value is a duplicate and instead of it put null or some other identifier. I advise you to use Rule Engine node with following syntax for last column:

$column4$ MATCHES $column1$ OR $column4$ MATCHES $column2$ OR $column4$ MATCHES $column3$ => "null"
TRUE => $column4$

After that add two more Rule Engine nodes with syntax for column3 and column2. You don't need to do anything for column1 obviously.

ipazin
  • 102
  • 7
  • Thanks for the answer ,Is there any other way to reduce the number rule nodes and achieve this in one single node itself? – ProgrammerL May 03 '19 at 07:06
  • If you only want to use single node you can do it using [Column Expressions](https://hub.knime.com/knime/nodes/Column_Expressions*2_0ji_xeG-SudFu-) node. Multiple rules can be defined there. – ipazin May 03 '19 at 08:51
  • I have one more question-Can you please answer:- Since we are using KNIME to run our as per our requirement. In our workflows , we compare customer data in 2 data bases , one oracle and one Hive and then we want report out on how much data is matched and how much is not . so now we want to group some customer ids based on locations of the customers to see from which location we are getting more mismatches. Can you tell me which should I use to get this type of customized report. – ProgrammerL May 06 '19 at 09:40
  • https://stackoverflow.com/questions/56015025/knime-comparing-datasets/56015823#56015823 – ipazin May 07 '19 at 08:53