-1

I just started learning openrefine 20 minutes ago. I have a text file with data that is separated by a consistent header ("JP") at the beginning of each chunk of data. The chunks of data are not all the same number of lines. I want each chunk of the original data to be on 1 row in openrefine. How can I do that?

Edit: Here is a sample. It's a fairly messy file, but I can count on the JP at the beginning of each distinct entry..

JP  
0034  
1-25-60  
01  
checked 1/92  

I am so happy to have taken these. The brown envelopes, blah blah. roll 1: Is a retirement event [EW]  
JP  
0035  
2-1-60  
01  
checked 1/92  

Bill therapy  

JP  
0036  
2-11-60  
01  
Checked 1/92  

Bill: there are many  

EW: The bills look good.  

I remember Babies used to look like this everyday, with the staff coming and going, all nice and professional.  
JP  
0037  
2-11-60  
01  

checked 1/92  
BLAHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH. blah blah blah blah bal… 
 oops>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  

again  

JP  
0039  
2-11-60  
01  
checked 1/92  

JP  
0041  
3-14-60  
Ettore Rizza
  • 2,800
  • 2
  • 11
  • 23
  • your question would be much clearer with an example – pintoch Sep 20 '17 at 07:29
  • +1 to pintoch - Can you post some example data? I'm not clear whether 'JP' is a separator between rows or between fields, and how that relates to the idea this is a 'line separated file' – Owen Stephens Sep 20 '17 at 09:10
  • I added a sample in the original question. It is a messy file that I need to process. I just want a row for all the lines between the "JP"s. I want a column for each line. Some rows will have more columns than others. That's okay in this context. – john patterson Sep 20 '17 at 16:39

1 Answers1

0

Here is an example of solution.

1 Open your text in Open Refine by choosing the "Line based text" option and uncheck the "store blank rows" checkbox;

2 in the single column of your project, use a text filter to isolate the rows containing the words "JP";

3 Create a new column based on this filtered column and move it to the beginning;

4 Delete the words JP in the first one (Transform -> null);

5 Use "join multi-valued cells" on the original column specifying a space as a separator.

All this stuff will be much clearer with a screencast.

enter image description here

Ettore Rizza
  • 2,800
  • 2
  • 11
  • 23
  • Wow, thanks! Question: At step 4, join multi-valued cell jams everything into one cell. How can I retain the cells, but still have them on the row they belong to, like you have it? – john patterson Sep 21 '17 at 13:32
  • I did it by adding "END" to each cell in the second column. Then, after joining the cells I split them into columns using "END" as the divider. Thanks! – john patterson Sep 21 '17 at 14:37
  • You're welcome. Do not forget to accept the answer if it looks right, so we can close the topic. – Ettore Rizza Sep 21 '17 at 14:42