18

Let's say I had the string

"[1,2,[3,4,[5,6]],7]"

How would I parse that into the array

[1,2,[3,4,[5,6]],7]

?

Nesting structures and patterns are completely arbitrary in my usage case.

My current ad-hoc solution involves adding a space after every period and using YAML.load, but I'd like to have a cleaner one if possible.

(One that does not require external libraries if possible)

Justin L.
  • 13,510
  • 5
  • 48
  • 83

4 Answers4

44

That particular example is being parsed correctly using JSON:

s = "[1,2,[3,4,[5,6]],7]"
#=> "[1,2,[3,4,[5,6]],7]"
require 'json'
#=> true
JSON.parse s
#=> [1, 2, [3, 4, [5, 6]], 7]

If that doesn't work, you can try running the string through eval, but you have to ensure that no actual ruby code has been passed, as eval could be used as injection vulnerability.

Edit: Here is a simple recursive, regex based parser, no validation, not tested, not for production use etc:

def my_scan s
  res = []
  s.scan(/((\d+)|(\[(.+)\]))/) do |match|
    if match[1]
      res << match[1].to_i
    elsif match[3]
      res << my_scan(match[3])
    end
  end
  res
end

s = "[1,2,[3,4,[5,6]],7]"
p my_scan(s).first #=> [1, 2, [3, 4, [5, 6]], 7]
Mladen Jablanović
  • 43,461
  • 10
  • 90
  • 113
  • I'd like to use this, but I can't quite get json to run properly on my computer, and in any case it wouldn't be much cleaner than the yaml solution. Is there a way to manually code this parsing? – Justin L. Dec 18 '10 at 07:48
  • Not sure what do you mean by "not clean", as it is one method call to parse it. You could of course either write a simple regex-based parser of your own, or use dedicated tools, such as http://treetop.rubyforge.org/ but neither of those is simple as `JSON.parse` IMHO. – Mladen Jablanović Dec 18 '10 at 08:16
  • Oh, and `JSON` is part of Ruby core lib, at least in 1.9.x. – Mladen Jablanović Dec 18 '10 at 09:22
  • If you have a multi-type array like `s = "['hello', 2, 'test', 5.0]"`, JSON will fail to parse with a generic error `unexpected token at ...`. However, YAML does work as shown in @Arup's [answer](http://stackoverflow.com/a/17271822/1986871): `YAML.load(s) => ["hello", 2, "test", 5.0]`. – Chris Cirefice Mar 10 '16 at 05:11
  • @ChrisCirefice: That's because single quoted strings are not valid JSON. – Mladen Jablanović Mar 10 '16 at 08:37
  • @MladenJablanović Well that makes sense; I guess for my use case YAML was better solely for that reason! – Chris Cirefice Mar 10 '16 at 15:15
18

The same can be done using Ruby standard libaray YAML as below :

require 'yaml'
s = "[1,2,[3,4,[5,6]],7]"
YAML.load(s)
# => [1, 2, [3, 4, [5, 6]], 7]
Arup Rakshit
  • 116,827
  • 30
  • 260
  • 317
  • +1; YAML successfully loads multi-type arrays, e.g. `"['hello', 2, 'test', 5.0]"`, where JSON fails to parse. – Chris Cirefice Mar 10 '16 at 05:12
  • This has the advantage it does not throw an error if there is a nil element, but it outputs the nil as 'nil', so still need to convert that to a nil. – Obromios Jul 16 '17 at 05:53
6

"Obviously" the best solution is to write your own parser. [ If you like writing parsers, have never done it before and want to learn something new, or want control over the exact grammar ]

require 'parslet'

class Parser < Parslet::Parser
  rule(:space)       { str(' ') }
  rule(:space?)      { space.repeat(0) }
  rule(:openbrace_)  { str('[').as(:op) >> space? }
  rule(:closebrace_) { str(']').as(:cl) >> space? }
  rule(:comma_)      { str(',') >> space?  }
  rule(:integer)     { match['0-9'].repeat(1).as(:int) }
  rule(:value)       { (array | integer) >> space? }
  rule(:list)        { value >> ( comma_ >> value ).repeat(0) }
  rule(:array)       { (openbrace_ >> list.maybe.as(:list) >> closebrace_ )}
  rule(:nest)        { space? >> array.maybe }
  root(:nest)
end

class Arr
  def initialize(args)
    @val = args
  end
  def val
    @val.map{|v| v.is_a?(Arr) ? v.val : v}
  end
end


class MyTransform < Parslet::Transform
  rule(:int => simple(:x))      { Integer(x) }
  rule(:op => '[', :cl => ']')  { Arr.new([]) }
  rule(:op => '[', :list => simple(:x), :cl => ']')   {  Arr.new([x]) }
  rule(:op => '[', :list => sequence(:x), :cl => ']')   { Arr.new(x) }
end

def parse(s)
  MyTransform.new.apply(Parser.new.parse(s)).val
end

parse " [   1  ,   2  ,  [  3  ,  4  ,  [  5   ,  6  , [ ]]   ]  ,  7  ]  "

Parslet transforms will match a single value as "simple" but if that value returns an array, you soon get arrays of arrays, then you have to start using subtree. returning objects however are fine as they represent a single value when transforming the layer above... so sequence will match fine.

Couple the trouble with returning bare arrays, with the problem that Array([x]) and Array(x) give you the same thing... and you get very confusing results.

To avoid this I made a helper class called Arr which represents an array of items. I could then dictate what I pass into it. Then I can get the parser to keep all the brackets even if you have the example that @MateuszFryc called out :) (thanks @MateuszFryc)

Nigel Thorne
  • 21,158
  • 3
  • 35
  • 51
  • 2
    I would say that it isn't necessarily the *obviously best* solution, as it depends on the input and its format and how it is generated. However, it is one of the most flexible. Also, a full working parslet example is a rare treat so +1 to you! – Mark Thomas Jul 13 '15 at 01:51
  • how about [[[1],[2,3]]] ? – Mateusz Fryc Jun 20 '19 at 11:17
  • @MateuszFryc - puts (parse "[[[1],[2,3]]]").inspect => [[1], [2, 3]] - seems to work to me :)... oh I see ...we lost the outer brackets? – Nigel Thorne Jun 24 '19 at 05:55
  • @NigelThorne - thanks for the update, but it seems that still there is some discrepancy in your parser/transformer, now. Take a look at e.g "[]" array, it produces [nil]. Looks like rule `rule(:op => '[', :cl => ']') { Arr.new([]) }` is never matched? as you thought it would be. This is rather matched by `rule(:op => '[', :list => simple(:x), :cl => ']') { Arr.new([x]) }`, thus the problem. – Mateusz Fryc Jun 25 '19 at 07:45
  • You could simply add compact to mentioned rule: `rule(:op => '[', :list => simple(:x), :cl => ']') { Arr.new([x].compact) }` and one which you think is used `rule(:op => '[', :cl => ']') { Arr.new([]) }`, delete completely. – Mateusz Fryc Jun 25 '19 at 07:54
  • BTW, perhaps you could take a look at my problem with parsing JSON-like structure, where I struggle with nested arrays too ;) https://stackoverflow.com/questions/56749529/how-to-transform-nested-arrays-string-in-json-like-string-to-structured-object-u I will have to rethink your approach anyway as it looks that it could be a key to my problems ;) – Mateusz Fryc Jun 25 '19 at 08:12
2

Use eval

array = eval("[1,2,[3,4,[5,6]],7]")
jatin
  • 1,379
  • 8
  • 7
  • This isn't a part my application that I feel safe to leave vulnerable to injections, sorry. – Justin L. Dec 18 '10 at 07:57
  • @Justin L., A "clean room" + "sandbox" will protect you from the evils of eval: http://stackoverflow.com/questions/2045324/executing-user-supplied-ruby-code-on-a-web-server/2046076#2046076 . About all that is left to protect against is code that runs a long time; Timeout can take care of that. – Wayne Conrad Dec 18 '10 at 23:19
  • Please add a note of the security risks to your answer so you don't get downvotes. Some suggestion as to how to mitigate the risks would also be valuable. – Nigel Thorne Jul 13 '15 at 03:29
  • I agree with necessity to add warning about security, therefore I downvoted the otherwise fine solution. – gorn Dec 07 '16 at 15:05
  • bad practice :/ – Darlan Dieterich May 19 '23 at 21:12