5

I'm doing this for fun (or as 4chan says "for teh lolz") and if I learn something on the way all the better. I took an AI course almost 2 years ago now and I really enjoyed it but I managed to forget everything so this is a way to refresh that.

Anyway I want to be able to generate text given a set of inputs. Basically this will read forum inputs (or maybe Twitter tweets) and then generate a comment based on the learning.

Now the simplest way would be to use a Markov Chain Text Generator but I want something a little bit more complex than that as the MKC basically only learns by word order (which word is more likely to appear after word x given the input text). I'm trying to see if there's something I can do to make it a little bit more smarter.

For example I want it to do something like this:

  • Learn from a large selection of posts in a message board but don't weight it too much
  • For each post:
    • Learn from the other comments in that post and weigh these inputs higher
    • Generate comment and post
    • See what other users' reaction to your post was. If good weigh it positively so you make more posts that are similar to the one made, and vice versa if negative.

It's the weighing and learning from mistakes part that I'm not sure how to implement. I thought about Artificial Neural Networks (mainly because I remember enjoying that chapter) but as far as I can tell that's mainly used to classify things (i.e. given a finite set of choices [x1...xn] which x is this given input) not really generate anything.

I'm not even sure if this is possible or if it is what should I go about learning/figuring out. What algorithm is best suited for this?

To those worried that I will use this as a bot to spam or provide bad answers to SO, I promise that I will not use this to provide (bad) advice or to spam for profit. I definitely will not post it's nonsensical thoughts on SO. I plan to use it for my own amusement.

Thanks!

encee
  • 4,544
  • 4
  • 33
  • 35
  • As a blatant self-plug, I did make a Markov-chain-based "spam" generator on StackApps. It certainly isn't "smart" in any sense. I'm still waiting to see if anyone is using it to post questions/answers here. :) [Flack Overstow](http://stackapps.com/questions/306/flack-overstow-generate-spam-from-trilogy-posts) – Mark Rushakoff May 28 '10 at 02:47
  • 2
    These guys: http://pdos.csail.mit.edu/scigen/ have a great text generator for computer science articles and their code is publicly available. – Amichai May 28 '10 at 03:26
  • 1
    That scigen thing looks like a good start. That said Markov Chains seems to remarkably be very funny: "If I'm getting a new keyboard - Why it's still not cool to admit you're a used TV from Liberty City?" – encee May 28 '10 at 04:20
  • I'm going to go out on a limb here and say the last thing the internet needs is more noise on its forums. I'm not too concerned about you using it to post to SO, as the user would get downvoted or banned almost immediately. – Cerin May 28 '10 at 12:38
  • @chris: I'm using this more as a self-educational tool as I really like AI topics but never had much of a chance to use it outside of one semester's class. I'll only over post it on the forums I visit a lot (and where everyone knows me) and even there simply as a "hey check this out" kind of thing. – encee May 28 '10 at 17:36
  • How'd this project work out for you? – Michael Paulukonis Feb 22 '12 at 16:29

1 Answers1

2

I was thinking about something like this, too. I think it could pose a significant improvement to use a grammatical analyzer together with a Markov Chain Generator. Then the MC can be trained on text phrases (verb "drive" often together with object "car") and produce grammatically correct sentences.

ziggystar
  • 28,410
  • 9
  • 72
  • 124
  • This is a good idea and it will hopefully produce more grammatically correct sentences which have more of a chance of working but I was looking to train the algorithm so that based on the training data it is more likely to produce sentences that make sense. So an idea was that the Markov Chain produces a sentence I can decide if it's positive or negative and based on that it can re-weigh the training data. But the issue is that then it will tend to the exact same sentences most of the time. I don't want the exact same but just the same structure or meaning. – encee May 28 '10 at 17:38