How to use DataWeave to create a CSV output with headers when the input doesn't has all headers in the first record

Question

i have json array which have not fix length, for example

[
 {
  a: 1
 },
 {
  a: 1,
  b: 2
 },
 {
  a: 1,
  c: 3
 },
 {
  a: 1,
  b: 2,
  d: 4,
  f: 6
 }
]

I need to transform them into CSV file with header. Below script only return header of the first line, not all columns. May I know how to operate?

%dw 2.0
output application/csv separator = "    ", escape = "", header = true, encoding="UTF-8"
---
payload

i expect result like so

Please use text instead of images. See https://meta.stackoverflow.com/questions/285551/why-not-upload-images-of-code-errors-when-asking-a-question for more details on the reason. The expected output is a CSV file which is pure text. You can paste it here directly. — aled, Jan 01 '22 at 12:52

aled · Answer 1 · 2022-01-01T15:11:35.650

My understanding of the problem is that your input doesn't has a first line row with all the headers for the output CSV. The headers need to be collected from every row.

I resolved the problem by calculating all the headers separately by getting all the headers from each record, then creating a row of headers elements manually at the beginning, and disabling automatic header generation. I don't make any claims of the performance. It may be bad for bigger payloads as this script basically iterates over each record twice. I encapsulated the logic in functions for easier reuse.

The main complexity is in recreating the output records with added null values for the missing columns and respecting the original order in function addMissingValue(). This is needed because this scenario is outside DataWeave built-in support for CSV.

%dw 2.0
output application/csv header=false
var collectHeaders=payload 
    map namesOf($) reduce ((item, accumulator=[]) -> (accumulator ++ item) distinctBy ((item, index) -> item) )

fun createHeaderLine(x)=[collectHeaders map {($): $} 
    reduce ((item, accumulator={}) -> accumulator ++ item)]

fun addMissingValue(o)=
    collectHeaders 
        map ( if (o[$]== null) {($): null} else {($): o[$]} ) 
        reduce ((item, accumulator={}) -> accumulator ++ item) 
 
---
createHeaderLine(payload) ++ (payload map addMissingValue($))

Input (formatted as proper JSON for clarity, if yours is Java or other format it will work the same):

[
    {
        "a": 1
    },
    {
        "a": 1,
        "b": 2
    },
    {
        "a": 1,
        "c": 3
    },
    {
        "a": 1,
        "b": 2,
        "d": 4,
        "f": 6
    }
]

Output:

a,b,c,d,f
1,,,,
1,2,,,
1,,3,,
1,2,,4,6

I noticed that your sample script uses a separator of several spaces. I didn't used it in my solution because it is not relevant. Feel free to add it if you need it. Note that CSV is a delimited text format, not a fixed length format.

score 0 · Accepted Answer · answered Jan 06 '22 at 08:12

Another solution is to use my csv-module (https://github.com/rbutenuth/csv-module). There you specify the headers in the configuration and don't take them from the input data.

Works even when you input data is missing one column in the whole dataset. And provides meta data for easier mapping with DataWeave.

How to use DataWeave to create a CSV output with headers when the input doesn't has all headers in the first record

2 Answers2