I host a few Source game servers and run a plugin that dumps player chat to a MySQL database. I have quiet a bit of chat history and was looking for something interesting to do with it. I'd like to build a system that allows members of my community to determine what is and isn't 'acceptable'.
My thought is that it'd work something like this: Somehow, I allow my community members to view chat logs (without identifying who said what) and they mark the logs as 'acceptable' or 'unacceptable'. I'd have to figure out if it will just show a block of text from a time frame, or just a specific user in a certain time frame, or just individual lines (could be good...could also mean the user completely missed the context of the chat).
This would work somewhat like the captcha system, where multiple users would end up grading the same series of chat logs. From there, I'd get values for groups of words. The theory is that it'd create a threshold where certain things are acceptable and others are not. After a set amount of my existing logs have been graded, I'd have a meaningful way of determining if a message met the standards my community has defined.
My questions are these -
- What would you recommend I show my users that are grading the logs? Should I show them a set of X chat lines? Should I show all chat lines in 5 minute intervals? Should I narrow these two windows down by only showing messages of 1 user during those time frames of X lines? Or should the users grade each line individually? I am planning on placing a limit on how many lines/groups a specific community member can grade per day.
- What would an appropriate way be to design the database storing all of this data? Currently, each individual chat line is stored as it's own row in MySQL. Each has a unique ID as well as the full text of the chat message sent in game. I've also got the player name and server it was received from but I don't see those as necessary.
- I'd like to create this in such as way that it becomes self sufficient / adaptive to the community and what they consider acceptable. Over time, more lines would be graded and added to the thresholds/calculations to determine if a message is 'good'/'bad'. If anyone has built something like this, can you point out pitfalls I should avoid while building this?