I'm currently working with a Markov chain text generator application in Ruby that takes in a body ("corpus") of text and then generates new text based off of that. The problem I need to solve currently is writing a Regexp that will return arrays containing the number of words that I specify. All I want to do here is grab a certain number of words (specified by the user), but multiple times throughout the whole string.
Going off another application I've seen, I'm using something like /(([.,?"();\-!':—^\w]+ ){#{depth}})/
where #{depth}
interpolates how many words I want at a time. This is supposed to grab two words at a time while allowing a subset of special characters, and that's the piece that's getting me. So the total question is this: How can I specify, dynamically, the number of words (separated by whitespace) I want while also allowing a range of special characters within those words?
Here's what I have currently:
# Regex
@match_regex = /(([.,?"();\-!':—^\w]+ ){2})/
s = input.scan(@match_regex).to_a
puts s.inspect
# Input
Within weeks they planned a meeting. She sent him poetry along with her itinerary,
having worked in a business meeting to excuse the opportunity. He prepared flowers
and a banner of welcome on his hearth.
# Output - seems to be grabbing last word again for some reason
[["Within weeks ", "weeks "], ["they planned ", "planned "], ["a meeting. ", "meeting. "],
["She sent ", "sent "], ["him poetry ", "poetry "], ["along with ", "with "],
["her itinerary, ", "itinerary, "], ["having worked ", "worked "], ["in a ", "a "],
["business meeting ", "meeting "], ["to excuse ", "excuse "],
["the opportunity. ", "opportunity. "], ["He prepared ", "prepared "], ["flowers and ", "and "],
["a banner ", "banner "], ["of welcome ", "welcome "], ["on his ", "his "]]
# Desired output. I'm not picky if it has trailing spaces or not as I can always trim that
["Within weeks", "they planned", "a meeting.", "She sent", "him poetry", "along with",
"her itinerary," "having worked", "in a", "business meeting", "to excuse", "the opportunity.",
"He prepared", "flowers and", "a banner", "of welcome", "on his"]
Any help would be greatly appreciated. Thanks!