1

I'm trying to write a regular expression to parse the values out of Unrealscript serialized objects. Part of that involves lines like this:

(X=32.69,Y='123.321',Z="A string with commas, just to complicate things!",W=Class'Some.Class')

The resultant capture should be:

[
    {
        'X':32.69,
        'Y':'A string with commas, just to complicate things!',
        'Z':'Class\'Some.Class\'
    }
]

What I want is to be able to distinguish between the key (eg. X) and the value (eg. Class\'Some.Class\').

Here is a pattern I've tried so far, just to capture a simple set of values (currently doesn't try to handle commas inside values, for now):

Pattern

\(((\S?)=(.+),?)+\)

Data set

(X=32,Y=3253,Z=12.21)

Result

https://regex101.com/r/gT9uU3/1

I'm still a novice with these regular expressions and any help would be appreciated!

Thanks in advance.

Colin Basnett
  • 4,052
  • 2
  • 30
  • 49
  • 1
    can you emphasize on this `What I want is to be able to distinguish between the key (eg. X) and the value (eg. Class'Some.Class').` – james jelo4kul Sep 25 '15 at 23:40
  • Indeed, the description is vague. Do you want to turn all ' inside values into \'? Do you want to turn all " surrounding values into '? Do you have an arbitrary amount of values? What engine/programming language? If you're sure about how your data will look and you're not after something advanced then perhaps you can achieve this with regex. If you want to escape an arbitrary amount of quotation marks etc. then this task isn't well suited for regex (atleast not a single one). – Pillowcase Sep 26 '15 at 00:21

1 Answers1

4

You can try this regex to associate key and value pairs:

(?!^\()([^=,]+)=([^\0]+?)(?=,[^,]+=|\)$)

Regex live here.

Explaining:

(?!^\()         # do not match the initial '(' character

([^=,]+)        # to match the key .. we take all from the last comma
=               # till the next '=' character

([^\0]+?)       # any combination '[^\0]' - it will be the key's value
                  # at least one digit '+'
                  # but stops in the first occurrence '?'

(?=             # What occurrence?

    ,[^,]+=     # a comma ',' and a key '[^,]+='
                  # important: without the key:
                  # the occurrence will stop in the first comma
                  # that should or should not be the delimiter-comma 

    |\)$        # OR '|':  the value can also be the last one
                  # which has not another key in sequence,
                  # so, we must accept the value
                  # which ends '$' in ')' character

)               # it is all

Hope it helps.

Sorry my English, feel free to edit my explanation, or let me know in the comments. =)