0

I'm (again) stuck because patterns... so let's see if with a little of help... The case is I have e. g. a string returned by a function that contains the following:

 My Script
ScriptID:RL_SimpleTest
Version:0.0.1
ScriptType:MenuScript
AnotherKey:AnotherValue
And, maybe, some more text...

And I'd want to parse it line by line and should the line contains a ":" get the left side content of the line in a variable (k) and the right content in another one (v), so e. g. I'd have k containing "ScriptID" and v containing "RL_SimpleTest" for the second line (the first one should be just ignored) and so on...

Well, I've started with something like this:

function RL_Test:StringToKeyValue(str, sep1, sep2)
    sep1 = sep1 or "\n"
    sep2 = sep2 or ":"
    local t = {}
    for line in string.gmatch(str, "([^" .. sep1 .. "]+)") do
        print(line)
        for k in string.gmatch(line, "([^" .. sep2 .. "]+)") do --Here is where I'm lost trying to get the key/value pair separately and at the same time...
            --t[k] = v
            print(k)
        end
    end
    return t
end

With the hope once I got isolated the line containing the data in the key:value form that I want to extract, I'd be able to do some kind of for k, v in string.gmatch(line, "([^" .. sep2 .. "]+)") or something so and that way get the two pieces of data, but of course it doesn't work and even though I have a feeling it's a triviality I don't know even where to start, always for the lack of patterns understanding...

Well, I hope at least I exposed it right... Thanks in advance for any help.

Rai
  • 314
  • 1
  • 2
  • 9
  • My advice is read the manual about [`string.gmatch`](https://www.lua.org/manual/5.4/manual.html#pdf-string.gmatch) and [`string.gsub`](https://www.lua.org/manual/5.4/manual.html#pdf-string.gsub). It contains many examples to help you understand pattern. – shingo Feb 08 '23 at 05:00
  • @shingo Yeah, thanks! The problem (not only, but specially with patterns) is I read about all this every time I have to deal with it, but after some days/weeks of not practicing it's like I've forgotten everything... It's very frustrating, even though I guess I'm not alone on this? But, of course, I take the advice with the hope at least something is going to be retained every time, even if it is in such a little by little way... – Rai Feb 08 '23 at 14:35

2 Answers2

2
local t = {}
for line in (s..'\n'):gmatch("(.-)\r?\n") do
  for a, b in line:gmatch("([^:]+):([^:\n\r]+)") do
    t[a] = b
  end
end

The pattern is quite simple. Match anything that is not a colon that is followed by a colon that is followed by anything that is not a colon or a line break. Put what you want in captures and you're done.

Piglet
  • 27,501
  • 3
  • 20
  • 43
  • This is incredible, thanks so much! The added problem for me (apart from not having enough pattern skills) was that, since I didn't get to work the `for k, v in s:gmatch(whatever) do`, I kinda assumed it wasn't a valid form for getting more than one returned value, so I didn't know how to continue my trial&erroring... I feel I can learn a lot from that little piece of code (e.g. didn't know either one could use gmatch without () like in your 1st loop) and that's why I couldn't be more grateful. P.S. Oh, just for the record, there's a typo in 2nd loop that'd make it parse "s" instead of "line". – Rai Feb 08 '23 at 14:07
1

I assume every line is of the format k:v, containing exactly one colon, or containing no colon (no k/v pair).

Then you can simply first match nonempty lines using [^\n]+ (assuming UNIX LF line endings), then match each line using ^([^:]+):([^:]+)$. Breakdown of the second pattern:

  • ^ and $ are anchors. They force the pattern to match the entire line.
  • ([^:]+) matches & captures one or more non-semicolon characters.

This leaves you with:

function RL_Test:StringToKeyValue(str)
    local t = {}
    for line in str:gmatch"[^\n]+" do
        local k, v = line:match"^([^:]+):([^:]+)$"
        if k then -- line is k:v pair?
           t[k] = v
        end
    end
    return t
end

If you want to support Windows CRLF line endings, use for line in (s..'\n'):gmatch'(.-)\r?\n' do as in Piglet's answer for matching the lines instead.

This answer differs from Piglet's answer in that it uses match instead of gmatch for matching the k/v pairs, allowing exactly one k/v pair with exactly one colon per line, whereas Piglet's code may extract multiple k/v pairs per line.

Luatic
  • 8,513
  • 2
  • 13
  • 34
  • And yet another incredible answer, I wonder how I'm going to decide which one mark as the correct one... Anyway, the function works equally fine here no matter if I use the UNIX LF or Windows CRLF version of it, if I paste my string into my code editor it says it's "Windows CR LF", which I assume is compatible with both methods? BTW, if I want to make separator customizable, it seems I'm forced to use `match` with the parenthesis like this `local k, v = line:match("^([^:]+)" .. sep .. "([^:]+)$")` or, some reason, it's like the pattern get _truncated_ and all stop working. Good point (MORE) – Rai Feb 08 '23 at 16:34
  • (CONT'D) about the possibility of can extend it to allow more than one key:value pairs per line, I well may end up needing that at some point, so it's good to know I could easily (I hope ) add such functionality by taking into account your inputs. Well, thank you so much too! It's going to be thanks to all your help that I'm been able to continue working on it, since this part has turned to be very important for the entire project. – Rai Feb 08 '23 at 16:37
  • @Ramon0 To parameterize in terms of the separator, you also have to replace the `:` in the character classes with your separator. Also note that you should escape the separator by adding a `%` before it with the pattern string. – Luatic Feb 08 '23 at 17:45
  • Yeah, sorry, I had not changed those yet because it was quick testing and they was going to be the same `:` anyway, but once changed everything works the same (with or without escaping the separator) as long as I use the parenthesis after `match` like this: `local k, v = line:match("^([^" .. sep .."]+)" .. sep .. "([^" .. sep .. "]+)$")`. If I use this instead: `local k, v = line:match"^([^" .. sep .."]+)" .. sep .. "([^" .. sep .. "]+)$"` I get an error no matter if I try to escape it or not, tho I may not totally getting the escaping part... But I think this is minor, so no worries about it! – Rai Feb 08 '23 at 18:50
  • @Ramon0: Yeah, that's because `line:match"^([^" .. sep .."]+)" .. sep .. "([^" .. sep .. "]+)$"` parses as `(line:match"^([^" .. sep .."]+)") .. sep .. "([^" .. sep .. "]+)$"` (which will attempt to concatenate the first result of `match`, which may be `nil`, with `sep`) whereas you want it to parse as `line:match("^([^" .. sep .."]+)" .. sep .. "([^" .. sep .. "]+)$")`, for which you need the parentheses. – Luatic Feb 09 '23 at 08:41
  • Ahhh... I think now I get it, thanks! I'm still kind of amazed one can use these functions in such a way I'd never seen before (or I may not have payed the due attention to it), but along with your clarifications plus pasting them into my editor & comparing with care is being the key. Well, so finally I'm using this version with some little additions for now and it's proving to work excellently well (although @Piglet's one also proved to work perfectly), so I think I'll finally mark this one as the answer, but not being able, as I already said, to feel more thankful to both... Greetings, sirs! – Rai Feb 09 '23 at 13:34