I'm having a little trouble formulating this question but I'll try to explain. I understand how to explode a single column of an array, but I have multiple array columns where the arrays line up with each other in terms of index-values. In my dataframe, exploding each column basically just does a useless cross join resulting in dozens of invalid rows. So I'm going to start here by showing the data.
This shows some results from SparkNLP with some text and four sets of features of the text. Each column from tr to nr contains an array. Each of these arrays lines up with the others.
+--+---------------------+---------------------+----------------------+--------------------+--------------------+
|ID| text| tr| lr| pr| nr|
+--+---------------------+---------------------+----------------------+--------------------+--------------------+
|10| thing: MacKay rolls|[thing, :, MacKay,...|[thing, :, MacKay, ...| [NN, :, NNP, NNS]| [O, O, I-PER, O]|
|11|thing: MacKay roll...|[thing, :, MacKay,...|[thing, :, MacKay, ...|[NN, :, NNP, NNS,...|[O, O, I-PER, O, ...|
|12| * I would like to...| [*, I, would, lik...| [*, I, would, lik...|[NN, PRP, MD, VB,...|[O, O, O, O, O, O...|
+--+---------------------+---------------------+----------------------+--------------------+--------------------+
What I want is a new dataframe with the ID and text plus each ith item in all arrays on a single row, like shown below for the above dataframe:
+--+---------------------+---------------------+----------------------+--------------------+--------------------+------+-------+---+-----+
|ID| text| tr| lr| pr| nr| token| lemma|pos| ner|
+--+---------------------+---------------------+----------------------+--------------------+--------------------+------+-------+---+-----+
|10| thing: MacKay rolls|[thing, :, MacKay,...|[thing, :, MacKay, ...| [NN, :, NNP, NNS]| [O, O, I-PER, O]| thing| thing| NN| O|
|10| thing: MacKay rolls|[thing, :, MacKay,...|[thing, :, MacKay, ...| [NN, :, NNP, NNS]| [O, O, I-PER, O]| :| :| :| O|
|10| thing: MacKay rolls|[thing, :, MacKay,...|[thing, :, MacKay, ...| [NN, :, NNP, NNS]| [O, O, I-PER, O]|MacKay| MacKay|NNP|I-PER|
|10| thing: MacKay rolls|[thing, :, MacKay,...|[thing, :, MacKay, ...| [NN, :, NNP, NNS]| [O, O, I-PER, O]| rolls| roll|NNS| O|
|11|thing: MacKay roll...|[thing, :, MacKay,...|[thing, :, MacKay, ...|[NN, :, NNP, NNS,...|[O, O, I-PER, O, ...| thing| thing| NN| O|
|11|thing: MacKay roll...|[thing, :, MacKay,...|[thing, :, MacKay, ...|[NN, :, NNP, NNS,...|[O, O, I-PER, O, ...| :| :| :| O|
|11|thing: MacKay roll...|[thing, :, MacKay,...|[thing, :, MacKay, ...|[NN, :, NNP, NNS,...|[O, O, I-PER, O, ...|MacKay| MacKay|NNP|I-PER|
|11|thing: MacKay roll...|[thing, :, MacKay,...|[thing, :, MacKay, ...|[NN, :, NNP, NNS,...|[O, O, I-PER, O, ...| roll| roll|NNS| O|
|11|...
...
|12| * I would like to...| [*, I, would, lik...| [*, I, would, lik...|[NN, PRP, MD, VB,...|[O, O, O, O, O, O...| *| *| NN| O|
|12| * I would like to...| [*, I, would, lik...| [*, I, would, lik...|[NN, PRP, MD, VB,...|[O, O, O, O, O, O...| I| I|PRP| O|
|12| * I would like to...| [*, I, would, lik...| [*, I, would, lik...|[NN, PRP, MD, VB,...|[O, O, O, O, O, O...| would| would| MD| O|
|12| * I would like to...| [*, I, would, lik...| [*, I, would, lik...|[NN, PRP, MD, VB,...|[O, O, O, O, O, O...| like| like| VB| O|
|12| * I would like to...| [*, I, would, lik...| [*, I, would, lik...|[NN, PRP, MD, VB,...|[O, O, O, O, O, O...| to| ...|...| O|
|12|...
...
+--+---------------------+---------------------+----------------------+--------------------+--------------------+------+-------+---+-----+
I don't need the tr through nr columns in the output but left them for clarity.
Is there a way to accomplish this?
Additionally, would there also be a way to extract the array index at the same time (add to the output row)?