0

I'm attempting to import in open-refine a csv extracted from a NoSQL database (Cassandra) without headers and with different number of columns per record.

For instance, fields are comma separated and could look like below:

1 - userid:100456, type:specific, status:read, feedback:valid
2 - userid:100456, status:notread, message:"some random stuff here but with quotation marks", language:french

There's a maximum number of columns and there aren't cleansing required on their names.

How do I make up a big excel file I could mine using pivot table?

pnuts
  • 58,317
  • 11
  • 87
  • 139
benr
  • 1

1 Answers1

1

If you can get JSON instead, Refine will ingest it directly.

If that's not a possibility, I'd probably do something along the lines of:

  1. import as lines of text
  2. split into two columns containing row ID and fields
  3. split multi-valued cells on fields column using comma as a separatd
  4. split fields column into two columns using colon as a separate
  5. use key/value on these two columns to unfold into columns
Tom Morris
  • 10,490
  • 32
  • 53
  • Thanks a lot Tom, I'll try to get JSON instead OR to develop your workflow in refine. If 2nd option is successful, I'll let you know. – benr Oct 15 '13 at 15:01