Parsing text in Ruby

Question

I'm working on a script for importing component information for SketchUp. A very helpful individual on their help page, assisted me in creating one that works on an "edited" line by line text file. Now I'm ready to take it to the next level - importing directly from the original file created by FreePCB.

The portion of the file I wish to use is below: "sample_1.txt"

[parts]

part: C1
  ref_text: 1270000 127000 0 -7620000 1270000 1
  package: "CAP-AX-10X18-7X"
  value: "4.7pF" 1270000 127000 0 1270000 1270000 1
  shape: "CAP-AX-10X18-7"
  pos: 10160000 10160000 0 0 0

part: IC1
  ref_text: 1270000 177800 270 2540000 2286000 1
  package: "DIP-8-3X"
  value: "JRC 4558" 1270000 177800 270 10668000 508000 0
  shape: "DIP-8-3"
  pos: 2540000 27940000 0 90 0

part: R1
  ref_text: 1270000 127000 0 3380000 -600000 1
  package: "RES-CF-1/4W-4X"
  value: "470" 1270000 127000 0 2180000 -2900000 0
  shape: "RES-CF-1/4W-4"
  pos: 15240000 20320000 0 270 0

The word [parts], in brackets, is just a section heading. The information I wish to extract is the reference designator, shape, position, and rotation. I already have code to do this from a reformatted text file, using IO.readlines(file).each{ |line| data = line.split(" ");.

My current method uses a text file re-formatted as thus: "sample_2.txt"

C1 CAP-AX-10X18-7 10160000 10160000 0 0 0
IC1 DIP-8-3 2540000 27940000 0 90 0
R1 RES-CF-1/4W-4 15240000 20320000 0 270 0

I then use an array to extract data[0], data[1], data[2], data[3], and data[5]. Plus an additional step, to append ".skp" to the end of the package name, to allow the script to insert components with the same name as the package.

I would like to extract the information from the 1st example, without having to re-format the file, as is the case with the 2nd example. i.e. I know how to pull information from a single string, split by spaces - How do I do it, when the text for one array, appears on more than one line?

Thanks in advance for any help ;-)

EDIT: Below is the full code to parse "sample_2.txt", that was re-formatted prior to running the script.

    # import.rb - extracts component info from text file

    # Launch file browser
    file=UI.openpanel "Open Text File", "c:\\", "*.txt"

    # Do for each line, what appears in braces {}
    IO.readlines(file).each{ |line| data = line.split(" ");

    # Append second element in array "data[1]", with SketchUp file extension
    data[1] += ".skp"

    # Search for component with same name as data[1], and insert in component browser
    component_path = Sketchup.find_support_file data[1] ,"Components"
    component_def = Sketchup.active_model.definitions.load component_path

    # Create transformation from "origin" to point "location", convert data[] to float
    location = [data[2].to_f, data[3].to_f, 0]
    translation = Geom::Transformation.new location

    # Convert rotation "data[5]" to radians, and into float
    angle = data[5].to_f*Math::PI/180.to_f
    rotation = Geom::Transformation.rotation [0,0,0], [0,0,1], angle

    # Insert an instance of component in model, and apply transformation
    instance = Sketchup.active_model.entities.add_instance component_def, translation*rotation

    # Rename component 
    instance.name=data[0]

    # Ending brace for "IO.readlines(file).each{"
    }

Results in the following output, from running "import.rb" to open "sample_2.txt".

    C1 CAP-AX-10X18-7 10160000 10160000 0<br>IC1 DIP-8-3 2540000 27940000 90<br>R1 RES-CF-1/4W-4 15240000 20320000 270

I am trying to get the same results from the un-edited original file "sample_1.txt", without the extra step of removing information from the file, with notepad "sample_2.txt". The keywords, followed by a colon (part, shape, pos), only appear in this part of the document, and nowhere else, but... the document is rather lengthy, and I need the script to ignore all that appears before and after, the [parts] section.

sawa · Accepted Answer · 2011-05-14T02:12:58.217

6

Your question is not clear, but this:

text.scan(/^\s+shape: "(.*?)"\s+pos: (\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)/)

will give you:

[["CAP-AX-10X18-7", "10160000", "10160000", "0", "0", "0"],
 ["DIP-8-3", "2540000", "27940000", "0", "90", "0"],
 ["RES-CF-1/4W-4", "15240000", "20320000", "0", "270", "0"]]

Added after change in the question

This:

text.scan(/^\s*part:\s*(.*?)$.*?\s+shape:\s*"(.*?)"\s+pos:\s*(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)/m)

will give you

[["C1", "CAP-AX-10X18-7", "10160000", "10160000", "0", "0", "0"],
 ["IC1", "DIP-8-3", "2540000", "27940000", "0", "90", "0"],
 ["R1", "RES-CF-1/4W-4", "15240000", "20320000", "0", "270", "0"]]

Second time Added after change in the question

This:

text.scan(/^\s*part:\s*(.*?)$.*?\s+shape:\s*"(.*?)"\s+pos:\s*(-?\d+)\s+(-?\d+)\s+(-?\d+)\s+(-?\d+)\s+(-?\d+)/m)

will let you capture numbers even if they are negative.

edited May 14 '11 at 02:12

answered May 13 '11 at 05:41

sawa

165,429
45
277
381

@sawa That looks simple enough... to clarify, I need to manipulate the data following "part:", "shape:", and "pos:", as a single string, regardless if the other elements are present, i.e. "package:", "value:", "ref_text:". – tahwos May 13 '11 at 12:59
Like this maybe...? `text.scan(/^part: "(.*?)"\s+shape: "(.*?)"\s+pos: (\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)/)` – tahwos May 13 '11 at 14:08
I don't see how `part:` is captured. You have in the text `C1`, `IC1`, `R1`, and then your expected result has `C1`, `C2`, `C3`. You should be carefull about the accuracy of the examples you give in the question. – sawa May 13 '11 at 17:33
Corrected - I had 2 windows open, trying different code, and copied from the wrong one. It was essentially the same though, I had just manipulated the text file a bit, to see the results. Thanks for catching that. As far as how part is captured, that's why I re-posted "like this?" Your example starts with `^` `\s+shape: "(.*?)"`... beginning of a line or string, followed by space+shape: (any char except newline, 0 or more) - is there any reason I can't expand it backwards, to pick up "part:" as well? – tahwos May 13 '11 at 19:57
OK - `(/^part: (.*)/)` works by itself, but not with the rest of the code included, which also works fine - thank you! – tahwos May 13 '11 at 20:37
@sawa - that worked, I just need to test it on a larger file. – tahwos May 13 '11 at 21:14
@sawa - testing it on a larger file worked with the regex running solo, with one minor hiccup, negative `pos:` values cause the code to skip to the next `pos:`, that returns all positive values, and apparently all data in between, is returned for the previous `shape:`. As I said, minor... not a big deal to make sure all is positive, prior to running the script. Testing it in the full script now... – tahwos May 13 '11 at 22:02
@sawa - The output was still the same - negative values were skipped, til the next positive result. Both examples capture the matches well though. – tahwos May 13 '11 at 23:55
@tahwos What do you mean by negative? – sawa May 13 '11 at 23:57
`pos: 243586000 -14986000 0 180 0` The second entry, causes the whole element to be ignored, and it picks back up at the next match, with all positive entries, for that part of the expression. Might just be the way I'm testing it - still writing the code for a full test. I can't just copy and paste it into the original code, and make it work. – tahwos May 14 '11 at 00:36
@tahwos I have to say that the examples that you gave is really not reflecting what you are asking. You fixed the part values, and now, it turns out you hadn't showed examples with negative values for the numbers. You should be careful in making examples. I update my answer again. Notice that someone (not me) had voted for closing your question. That is because it is not written with care. You can't ask for an answer whose quality greatly exceeds the quality of your question. – sawa May 14 '11 at 00:50
As I stated in a previous reply, your examples were satisfactory in the part of collecting data, and that the "previously unknown" results concerning negative numbers, was an acceptable conflict (I can just move the origin of the entire project, to ensure everything is positive to begin with). However, your last example converted the negative values to positive - which is not an acceptable conflict. – tahwos May 14 '11 at 01:35
@sawa - I scrolled up and down the page and don't see anything, that indicates votes on closing this question. Maybe I'm looking in the wrong spot, but anyhow - your second rendition of the code that includes matching the `part:` attribute, is what I have been working around to integrate regex into my original code. As it stands, my current state of progress would be an entirely new question anyway, and you have been incredibly helpful in getting me this far. Thank You! ;-) – tahwos May 14 '11 at 01:47
btw... I posted the top 3 parts to the question vs. the whole file, which contains 239 parts, and over 17,000 lines of information total (to be polite). So yes, again, I'm very pleased with the results of your second iteration of the expression - you were able to present a solution in one line of code, that read through the whole document, and extracted only the information that I needed. – tahwos May 14 '11 at 01:58
@tahwos There was a mistake in my answer as you noted. I should have put the `-?` inside the parentheses. I fixed it above. Now, it should work correctly. – sawa May 14 '11 at 02:15
@sawa - Yes, that did the trick, all numbers were captured as they appear in the original file. Thanks again - tahwos – tahwos May 14 '11 at 02:24

score 0 · Answer 2 · answered May 13 '11 at 04:17

0

Not sure exactly what you're asking, but hopefully this helps you get what you're looking for.

parts_text = <<EOS
[parts]

part: **C1**
  ref_text: 1270000 127000 0 -7620000 1270000 1
  package: "CAP-AX-10X18-7X"
  value: "4.7pF" 1270000 127000 0 1270000 1270000 1
  shape: "**CAP-AX-10X18-7**"
  pos: **10160000** **10160000** 0 **0** 0

part: **IC1**
  ref_text: 1270000 177800 270 2540000 2286000 1
  package: "DIP-8-3X"
  value: "JRC 4558" 1270000 177800 270 10668000 508000 0
  shape: "**DIP-8-3**"
  pos: **2540000** **27940000** 0 **90** 0

part: **R1**
  ref_text: 1270000 127000 0 3380000 -600000 1
  package: "RES-CF-1/4W-4X"
  value: "470" 1270000 127000 0 2180000 -2900000 0
  shape: "**RES-CF-1/4W-4**"
  pos: **15240000** **20320000** 0 **270** 0
EOS

parts = parts_text.split(/\n\n/)
split_parts = parts.each.map { |p| p.split(/\n/) }
split_parts.each do |part|
  stripped = part.each.collect { |p| p.strip }
  stripped.each do |line|
    p line.split(" ")
  end
end

This could be done much more efficiently with regular expressions, but I opted for methods that you might already be familiar with.

answered May 13 '11 at 04:17

ezkl

3,829
23
39

I'm not afraid of regular expressions, in fact, I think they're easier to read. – tahwos May 13 '11 at 12:44
Note to self, "do not hit enter while replying". In your example, if some of the elements from the original text file are missing, i.e. package:, value:, would the whole array be corrupt? – tahwos May 13 '11 at 12:47
No, the elements are irrelevant. The delimiters are the newlines (/\n/). There are 2 line breaks between each "group". Within each "group" there is a single line break between each "line". The inner most block strips white space from the beginning and end of each line and splits the line on whitespace. – ezkl May 13 '11 at 14:27
I guess that's why it didn't make sense - they need to be relevant, because if the other elements don't have values, they don't exist in the text file. i.e. The elements of interest are the only thing "static" and fool proof, for someone else to use the script - part:, shape:, and pos: are always there - ref_text:, package:, and value: may not be. In a perfect document, all entities would exist, and work under the simplest of terms. From what I see in your example, if some are missing (most likely the middle), then their relative position, wouldn't match, from group to group. – tahwos May 13 '11 at 20:17
The code above, gave this error... test_other.rb:27:in `each': no block given (LocalJumpError) – tahwos May 21 '11 at 14:34

Parsing text in Ruby

2 Answers2

Linked