1

I have two csv files that are funnelled into a MergeContent Processor. I want them to be merged together. They both have the same columns. If the first and second csv's look like this:

First CSV:

id, name
12,John
11,Keels

Second CSV:

id, name
22,Kelly
25,Felder

My output should look like this:

id, name
12,John
11,Keels
22,Kelly
25,Felder

I have tried doing this through the MergeContent Processor. But it Changes the data into a different format I don't want that to happen. Both the Input files and the output files must be .csv and also contain the same name as the input files. (The input files have the same name)

Himsara Gallege
  • 934
  • 1
  • 8
  • 24

1 Answers1

1

Use MergeRecord processor with the common attribute. For example, both flow files have the same attribute such as filename = test.csv then you can set the MergeRecord processor as follows:

Record Reader                      CSVReader
Record Writer                      CSVRecordSetWriter
Merge Strategy                     Bin-Packing Algorithm
Correlation Attribute Name         filename
Attribute Strategy                 Keep Only Common Attributes
Minimum Number of Records          3

The important thing is the minimum number of records, which is the number of rows to be merged. In this case, it should be larger than 2 because each CSV has 2 rows. Then, the CSV will wait for the other CSV to exceed the minimum.

Lamanus
  • 12,898
  • 4
  • 21
  • 47
  • I get an error saying filename with the same name already exists. When trying to use a `putfile` processor. – Himsara Gallege Nov 27 '19 at 10:06
  • If you input several files, then you have to specify the filename that should not be duplicated. That is not a problem of merging but your definition about the filename. You may specify the filename attribute with some time format such as `${filename:append(${now():format("yyyy-MM-dd_HH:mm:ss", "GMT")}):append('.csv')}`. – Lamanus Nov 27 '19 at 10:09
  • I have a csv file which I break into two flowfiles and process independently. I send the two files to the `mergeRecord` Processor. Therefore both of them have the same filename. I am confused as to why it says to have a file with the same name. As it should be one file at the end. – Himsara Gallege Nov 27 '19 at 10:14
  • Oh, I see. Why don't you choose the correlation attribute name as filename? And Since you split the record from a file, it is better to use the `defragment` strategy. – Lamanus Nov 27 '19 at 10:19
  • What exactly is `correlation Attribute Name` ? Also I get an error when I changed to `defragment` from `merge record` processor saying ` Could not merge bin with 1 flowfiles because the fragment.count attribute was not present on any of the flowfiles` – Himsara Gallege Nov 27 '19 at 10:31
  • How did you split your record? If it is not by split processor, then you cannot use `defragment`. The correlation attribute is how the processor distinguish the different flowfiles. So, if that attribute is same, then the processor will try to merge the records. – Lamanus Nov 27 '19 at 10:45
  • So I believe it would be fine if I assign the `correlation Attribute Name` as `filename` since its the same. I separated the two files using `QueryRecord` Processor. So I'll change it back to `Bin-Packing Algorithm`. But I still get the same filename error. – Himsara Gallege Nov 27 '19 at 10:50
  • I have tested with the settings, `correlation Attribute Name` as `filename` with the same filename and it is fine. The result also has the same filename. – Lamanus Nov 27 '19 at 10:53
  • I tried again. Yet I get the same error. Just to clarify everything: The only thing u changed in the answer is using `filename` as the `Correlation Attribute Name `. Am I right? I haven't changed any configurations other than the ones in the answer. – Himsara Gallege Nov 27 '19 at 11:00
  • Yes, and I use the test data with `id,name` header without space, `id, name`. I used two `GenerateFlowFile` processor with those two CSV texts and set the attribute `filename` to be `test.csv`. So it is the same value for each CSV. The flow files go to `MergeRecord` processor directly. – Lamanus Nov 27 '19 at 11:05
  • I retried is it possible for you to drop you template as a GitHub link so I can download it and check as to why my one doesn't work. this is the link for two screenshots I have taken of my Nifi flow. https://drive.google.com/drive/folders/1J101Ry4JNGDeon_MS2OeVZ5_dong0WnD?usp=sharing – Himsara Gallege Nov 27 '19 at 11:07
  • https://www.dropbox.com/s/2nc9byoxaa3xvex/MergeRecord.xml?dl=0 – Lamanus Nov 27 '19 at 11:13
  • Nope doesn't work. Only id is being merged not the names..... – Himsara Gallege Nov 27 '19 at 11:24
  • cuz, I select only id column, sorry. Add the name column to the `queryrecord` processor, such as `select id, name from FLOWFILE where id like '1%'`. Or you can download the template again for fixed one. – Lamanus Nov 27 '19 at 11:37