2

I have a bunch of product data to clean prior to entry into a database that looks like this:

COL A COL B COL C... "N"
Option 1 A, B, C, D Option 1 attribute
Option 2 C, D, F Option 2 attribute
Option 3 D, J, Z Option 3 attribute

And I'd like for it to look like this with a unique row for every unique product option:

COL A COL B COL C... "N"
Option 1 A Option 1 attribute
Option 1 B Option 1 attribute
Option 1 C Option 1 attribute
Option 1 D Option 1 attribute
Option 2 C Option 2 attribute
Option 2 D Option 2 attribute
Option 2 F Option 2 attribute
Option 3 D Option 3 attribute
Option 3 J Option 3 attribute
Option 3 Z Option 3 attribute

I understand how I could do this with a python script, but I am already using OpenRefine, and I am hoping not to involve a whole new process to my data flow.

Is there an easy way to do this in OpenRefine? I am having a hard time finding a method or extensions for something like this.

Thanks!

EDIT

@magdmartin How can you fill down blank cells using delineated values from the first cell?

COL A COL B COL C... "N"
Option 1 A,B,C,D Option 1 attribute
Option 1 Option 1 attribute
Option 1 Option 1 attribute
Option 1 Option 1 attribute
Option 2 C,D,F Option 2 attribute
Option 2 Option 2 attribute
Option 2 Option 2 attribute
Option 3 D,J,Z Option 3 attribute
Option 3 Option 3 attribute
Option 3 Option 3 attribute

Turned into

COL A COL B COL C... "N"
Option 1 A Option 1 attribute
Option 1 B Option 1 attribute
Option 1 C Option 1 attribute
Option 1 D Option 1 attribute
Option 2 C Option 2 attribute
Option 2 D Option 2 attribute
Option 2 F Option 2 attribute
Option 3 D Option 3 attribute
Option 3 J Option 3 attribute
Option 3 Z Option 3 attribute

Thanks!

codeaboard
  • 63
  • 7

1 Answers1

2

I recorded a video here walking through each options describe below here: https://youtu.be/3194zXoJtqI

For this project, you will need to use two OpenRefine functions

If you have a lot of columns you can use the All > Transform to speed up the process with the following expression row.record.cells[columnName].value[0]. The trick here is to fill down Col A last so we can keep the record mode when filling down other column (see screenshot below)

enter image description here

magdmartin
  • 1,712
  • 3
  • 20
  • 43
  • Thank you, @magdmartin this is immensely useful! I have a follow up question as well. How can you fill down blank cells using delineated values from the first cell in a column? I'm adding an edit above using a similarly formatted example. – codeaboard Jun 01 '21 at 22:11
  • Can you post it as a separate question so others can answer it (and it will be more visible)? – magdmartin Jun 05 '21 at 13:34
  • Posted! https://stackoverflow.com/questions/67877720/openrefine-how-to-fill-down-blank-cells-using-delineated-values-from-the-first – codeaboard Jun 07 '21 at 19:36
  • Does it matter if you're in ROWS or RECORDS view for this process? – Jerry Sep 28 '22 at 19:47
  • 1
    @Jerry yes the records mode is important when you fill down – magdmartin Sep 29 '22 at 18:48