1

I'm writing a Mushclient plugin in Lua. Mushclient includes a PCRE mod that allows me to compile regular expressions using the rex.new function. I'm not sure if I need to use this to accomplish what I'm trying to do, but I suspect I may although I would rather not.

Basically I'd like to be able to split a string into a table using the separators "," or " and ". However, there are certain cases where these 'separators' appear inside an item that I would like to remain un-split (i.e. Felix, the Cat). Here is what I got done so far:

false_separators = {"Felix, the Cat", "orange and tan cat", "black and white cat"}
separators = rex.new(" ?(.+?)(?:,| and )")
local sample_text = "a black and white cat, a tabby cat, a giant cat, Felix, the Cat and an orange and tan cat."
index = 1
matches = {}
separators:gmatch(sample_text, function (m, t) 
    for k, v in pairs(t) do
          print(v)
          table.insert(matches, v)
    end
 end)

This will output:

a black
white cat
a tabby cat
a giant cat
Felix
the Cat
an orange

There are two problems with this. Firstly, the last item is not included. Secondly, I have not figured out how to implement my false_separators table. My desired output is:

a black and white cat
a tabby cat
a giant cat
Felix, the Cat
an orange and tan cat

I could do it with lots of gsubing but it seems inelegant and possibly exploitable or slow:

false_separators = {"Felix, the Cat", "orange and tan cat", "black and white cat"}
local sample_text = "a black and white cat, a tabby cat, a giant cat, Felix, the Cat and an orange and tan cat."

function split_cats(text, false_sep)
    for k, v in ipairs(false_sep) do
        text = text:gsub(v, v:gsub(" ", "_")) -- replace spaces in false separator matches with underscores
    end
    text = text:gsub(" and ", ", "):gsub(", ", ";") -- replace ' and ' (that isn't surrounded by underscores) with a comma, then replace all commas that aren't followed by underscores with a semi-colon. Semi-colon is now the true delimiter
    m = utils.split (text, ";") or {} -- split at semi-colon
    for i, v in ipairs(m) do
        m[i] = v:gsub("_", " ") -- remove underscores
    end
    return m
end

table.foreach(split_cats(sample_text, false_separators), print)

Output:

1 a black and white cat
2 a tabby cat
3 a giant cat
4 Felix, the Cat
5 an orange and tan cat.
Eli Bell
  • 227
  • 1
  • 9
  • Why don't you use a different delimiter to separate the string? eg. `.` or `;` would make this task trivial. – greatwolf Feb 08 '15 at 10:08
  • Like this? (see edit) – Eli Bell Feb 08 '15 at 16:15
  • No, I mean changing the actual input to something like this: `"a black and white cat; a tabby cat; a giant cat; Felix, the Cat; an orange and tan cat."` – greatwolf Feb 09 '15 at 00:59
  • Well the input is a substring coming from a mud, so I have no control over that. I could split the string at each comma. Then take the last item and split that at the first ' and '. The grammer the mud uses would make this work for any cases like "black and white cat". Unfortunately, it doesn't work for "Felix, the Cat" though. @greatwolf – Eli Bell Feb 09 '15 at 03:54

0 Answers0