Profanity API
I have built a basic profanity API that echoes a 1 if it identifies any, and a 0 if the message is okay. I run into some silly problems though.
For example, if the word hell is on my swear list it'll also identify words like hello as profanity.
Each word is in a txt file in this format
badword
badword
badword
lolanotherbadword
naughtyword
LeetSpeak
1 4l50 w4n7 70 1mpl3m3n7 50m3 50r7 0f l337 func710n, 50 7h47 1 d0n'7 h4v3 70 l157 3v3ry p0551bl3 v4r14710n 0f 7h3 w0rd. (I also want to implement some sort of leet function, so that I don't have to list every possible variation of the word.)
Bypassing the Chat Filter
Whether you access the API from
api.domain.tld/chat/profanity.php?access_token=whatever&filter_string=whatever
or
api.domain.tld/chat/profanity/access_token/filter_string
the same problem occurs. If people put an & or ? before their message it allows them to bypass the filter (and echoes a 0). When checking the logs I've noticed that messages that begin with an & or ? are logged as blank messages, so I'm guessing it's just messing up a variable or something.
Spacing
People think they are clever by saying h e l l or h e l l, etc. An intuitive chat filter would likely be able to identify this sort of thing.
Data Storage and Retrieval
I've also been thinking to myself if a txt file is really a valid storage and retrieval mechanism. Right now I've only got 400 words, but it'll keep growing and it's bound to be slow. What is better? An in-line PHP array, a txt file, or a database?
The Code
<?php
require('conn.php');
$date = gmdate('Y-m-d');
$time = gmdate('h:i:s');
$access_token = $_GET["access_token"];
$filter_string = $_GET["filter_string"];
function wordsExist(&$string, $words)
{
foreach ($words as &$word) {
if (stripos($string, $word) !== false) {
return true;
}
}
return false;
}
if (isset($access_token)) {
$sql = "SELECT * FROM api WHERE access_token='" . $access_token . "'";
$sql2 = "UPDATE api SET calls = calls + 1 WHERE access_token='" . $access_token . "'";
$sql3 = "UPDATE api SET last_query = CURRENT_TIMESTAMP WHERE access_token='" . $access_token . "'";
$sql4 = "UPDATE api SET profanity_api_calls = profanity_api_calls + 1 WHERE access_token='" . $access_token . "'";
$sql5 = "UPDATE api SET last_profanity_query = CURRENT_TIMESTAMP WHERE access_token='" . $access_token . "'";
$sql6 = "UPDATE api SET profanity_detected = profanity_detected + 1 WHERE access_token='" . $access_token . "'";
$result = mysqli_query($conn, $sql);
$result2 = mysqli_query($conn, $sql2);
$result3 = mysqli_query($conn, $sql3);
$result4 = mysqli_query($conn, $sql4);
$result5 = mysqli_query($conn, $sql5);
if (mysqli_num_rows($result) >= 1) {
if (wordsExist($filter_string, file('curse-list.txt', FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES))) {
$result6 = mysqli_query($conn, $sql6);
file_put_contents('logs/profanity/' . $date . '-log.txt', "1 [$time] $filter_string\n", FILE_APPEND);
echo '1';
} else {
file_put_contents('logs/profanity/' . $date . '-log.txt', "0 [$time] $filter_string\n", FILE_APPEND);
echo '0';
}
}
}
mysqli_kill();
mysqli_close();
?>
My .htaccess
RewriteEngine On
RewriteRule ^profanity/(.*)/(.*)$ profanity.php?access_token=$1&filter_string=$2
RewriteRule ^advertising/(.*)/(.*)$ advertising.php?access_token=$1&filter_string=$2
Escaping User input
As is - how secure is my above code implementation? If it's vulnerable could I have specific examples as of how hackers could abuse it?