As this was about learning, and doing it in MYSQL only (and not about implementing a PHP better way to do it, and boy are there way better ways to do it), I present something to learn a few mysql techniques.
About the only remotely-interesting line to me is the set @sql1
line. Perhaps several large paragraphs could be written about it. For now, I just present it.
The Schema
-- drop table badGood;
create table badGood
( -- maps bad words to good
id int auto_increment primary key,
bad varchar(100) not null,
good varchar(100) not null,
dtAdded datetime not null
);
-- truncate table badGood;
insert badGood(bad,good,dtAdded) values ('god','gosh',now()),('rumpus','rear section',now());
-- drop table posts;
create table posts
( postId int auto_increment primary key,
orig varchar(1000) not null,
cleanified varchar(1000) not null,
dtAdded datetime not null, -- when it was inserted into system, ready for cleaning
dtCleaned datetime null, -- when it was cleaned
isViewable int not null -- or bool, whatever. 0=No (not ready yet), 1=Yes (clean)
);
-- truncate table posts;
-- drop table xxx;
create table xxx
( -- this table will contain one row for every word passed to stored proc,
-- ordered by word sequence left to right in sentence
-- order by meaning column "id" (auto_inc). Note, there is no guarantee, in fact expect it not to happen,
-- that for any given postId, that the id's will be consecutive, but they will be in order
--
-- Reason being, multiple concurrent access of posts coming index
--
-- Decided against making this a temp table inside stored proc, but it was considered
id int auto_increment primary key,
postId int not null, -- avoid FK for now due to speed
word varchar(50) not null, -- word as presented by poster guy
word2 varchar(50) null, -- a more rated-G version of the word that is substituted
isDirty int not null, -- or bool, whatever. 0=clean, 1=naughty
key(postId)
);
-- truncate table xxx;
The Stored Procedure
DROP PROCEDURE IF EXISTS cleanAndInsert;
delimiter $$
CREATE PROCEDURE cleanAndInsert
( suspectTxt varchar(255) # this text is suspect. Might contain non G-rated words
# other parameters too probably
)
BEGIN
declare insertedId int; -- this will house the PK value of the postId
insert posts(orig,cleanified,dtAdded,dtCleaned,isViewable) values (suspectTxt,'',now(),null,0); -- insert the passed string
set @insertedId:=LAST_INSERT_ID(); # now we have the PK id just inserted
-- the concat routine below is VERY FRAGILE to write, so as the sql string is slowly tweaked into perfection, with one working at that moment
-- I rem it out and create a new version under it, so the slightest error does not set me back 10 minutes (at least)
-- SET @sql1 = CONCAT("INSERT INTO xxx (word) VALUES ('",REPLACE((SELECT GROUP_CONCAT(orig) AS colx FROM posts where id=1), " ", "',null,0),('"),"');");
-- SET @sql1 = CONCAT("INSERT INTO xxx (postId,word) VALUES (",@insertedId,",'",REPLACE((SELECT GROUP_CONCAT(orig) AS colx posts where postId=@insertedId), " ", "',null,0),('"),"',null,0);");
SET @sql1 = CONCAT("INSERT INTO xxx (postId,word,word2,isDirty) VALUES (",@insertedId,",'",REPLACE((SELECT GROUP_CONCAT(orig) as colx FROM posts where postId=@insertedId), " ", "',null,0),(¿^?fish╔&®,'"),"',null,0);");
-- select @sql1; -- debugging purposes, rem'd out
-- Ideally @insertedId is inserted in the SET @sql1 line a few above, and NOT with the fish hard-coded bizareness, but it was too fragile
-- and time consuming. So this is an ugly hack and nothing to be proud of. So fixing it is a "TO DO"
set @sql2=replace(@sql1,'¿^?fish╔&®',@insertedId); -- This is the insert statement to run to blast out the words
-- select @sql2; -- debugging purposes, rem'd out.
PREPARE stmt FROM @sql2; -- you now have a prepared stmt string to execute (which inserts words into table xxx)
EXECUTE stmt;
-- now the clean word section
update xxx x
join badGood bg
on bg.bad=x.word
set x.isDirty=1,x.word2=bg.good
where postId=@insertedId;
-- I know, this is lame, but it allows us to use word2 simply as the final word and simplify our UPDATE posts after this block
update xxx
set word2=word
where postId=@insertedId and isDirty=0;
-- now the update section, to save the much cleaner string out to the posts table
update posts
set cleanified=
( select group_concat(word2 ORDER BY id SEPARATOR ' ') as xyz
from xxx where postId=@insertedId
), isViewable=1, dtCleaned=now()
where postId=@insertedId;
-- one could do a "delete from xxx where postId=@insertedId" if they wanted to. I kept it for debugging. Others delete the rows
select @insertedId as id; -- useful for calling routine, telling it the PK value
END
$$
Test (call the stored procedure)
in PHP, you would just call it with a normal query, starting with the $sql beginning with "call ..."
call cleanAndInsert('I type acceptable sentences'); -- returns 1 row, id is 1
call cleanAndInsert('Stan you can kiss my rumpus'); -- returns 1 row, id is 2
-- note this is very easy to trick, such as a naughty word not surrounded by whitespace, or broken out with spaces like "r u m p u s"
The Results
select * from posts order by postId desc;
+--------+-----------------------------+-----------------------------------+---------------------+---------------------+------------+
| postId | orig | cleanified | dtAdded | dtCleaned | isViewable |
+--------+-----------------------------+-----------------------------------+---------------------+---------------------+------------+
| 2 | Stan you can kiss my rumpus | Stan you can kiss my rear section | 2015-09-22 11:08:29 | 2015-09-22 11:08:29 | 1 |
| 1 | I type acceptable sentences | I type acceptable sentences | 2015-09-22 11:08:23 | 2015-09-22 11:08:23 | 1 |
+--------+-----------------------------+-----------------------------------+---------------------+---------------------+------------+
Wrapping up
It was for learning. Take it for what it is.