0

First off, please note that this is not answered in the question here: MySQL update table based on another tables value or indeed any other Stack question or answer that I could find!

I converted a forum on a website and now just need to repair the internal URL links on a few hundred of the posts.

Posts are found in the column post_content within the table xyz_posts

ID     post_content  

1467  This is great https://example.com/index.php?topic=1234 I really like it
1468  Hello world
1469  Take a look https://example.com/index.php?topic=5678.0

You can see the URLs are mostly buried in the post text. Note that the example above can sometimes be written with topic=1234.0 at the end, although it is actually stored in the database as a value of 1234. I don't want to rewrite the url and accidentally keep the .0

Here's an example of how I need that table to look:

ID     post_content  

1467  This is great https://example.com/finished-page/ I really like it
1468  Hello world
1469  Take a look https://example.com/another-page/

So, the table xyz_converter maps the old topic number to the new post ID like this:

meta_key             meta_value     value_id  

_bbp_old_topic_id    1234           15675
_bbp_old_reply_id    1234           17439

Caveat here, the number 1234 also exists in this table for forum replies which we don't want, but the meta_key and value_id are different for those as shown. This SQL query works to get to the right one:

SELECT * FROM `xyz_converter` WHERE meta_key LIKE '_bbp_old_topic_id' AND `meta_value` LIKE '1234'

Also in table xyz_posts we map the value_id mentioned above to the post's URL suffix like this:

ID        post_name

15675     finished-page 
15676     another-page 

How do I construct a SQL query that will detect the meta_value in one table and then replace it with the correct final URL as mentioned?

PROCESS SUMMARY

  1. Detect the URL in xyz_posts post_content
  2. Extract the topic number from the URL (eg 1234). If it's 1234.0 then take just 1234
  3. Convert it to a post number in found in xyz_converter value_id (eg 15675) ensuring it is found alongside _bbp_old_topic_id
  4. Take the suffix of the URL in xyz_posts post_name (eg 'finished-page')
  5. Rewrite the original URL to include the suffix (not forgetting the trailing slashes).

I'm using MySQL 5.7.29 and PHPMyAdmin. I'm fairly new to SQL queries and a Regex noob, but willing to learn more!

Sara44
  • 422
  • 1
  • 9
  • 25
  • 1
    mysql 5.7 has not many regexp fuctions, but you can make a backup to mysql 8 and make there your changes. import the table in5.7 aand then update the original table. for the rest 9o don't understand what you are looking for. – nbk Feb 15 '20 at 17:28
  • @Sara44 I anwered your question but It's not clear what you want.You want to update `xyz_converter` or `xyz_posts`.? – George Pant Feb 15 '20 at 17:45
  • @GeorgePant See my reply to your answer please. I restructured the question. – Sara44 Feb 17 '20 at 14:42
  • @Sara44 I restructured my answer too :p – George Pant Feb 17 '20 at 14:43
  • Some things are not practical in SQL. Do your task in PHP or whatever your app language is; it will be much easier there. – Rick James Feb 18 '20 at 19:39

1 Answers1

1

A solution that uses only MySQL commands and works in MySQL 5.7 could be like this.

If .0 is present only in the urls you could use a query like this to remove all occurences of .0 from post_content in xyz_posts.

UPDATE `xyz_posts` SET post_content=REPLACE(post_content,'.0 ',' ') WHERE post_content LIKE '%topic=%.0 %';

Then you can use

CREATE TABLE temp_tbl
SELECT CONCAT('index.php?topic=',c.meta_value) as `find_value` ,p.post_name as `replace_value`
FROM `xyz_converter` c
INNER JOIN `xyz_posts` p ON c.value_id=p.id AND `meta_key`='_bbp_old_topic_id'
ORDER BY meta_value DESC;

UPDATE `xyz_posts` p 
INNER JOIN  `temp_tbl` t ON p.post_content LIKE CONCAT('%',t.find_value,'%')
SET p.`post_content`=REPLACE(p.`post_content`,t.find_value,t.replace_value);

The first command will create a temporary table where the first column will be the value that you want to find and replace like index.php?topic=1234 and the second column will be the value you want to be replace with like finished-page

The second command will replace posts_content in xyz_posts taking the first column from temp_tbl and replacing it with the second column.

Below an sql fiddle where you can see the solution in action

http://sqlfiddle.com/#!9/7bce8c

Of course you should first create a copy of your database and try these commands to ensure everything works fine before trying it in your production database.

George Pant
  • 2,079
  • 1
  • 9
  • 14
  • Thanks for the reply @George Pant but I need to replace the URL string that is found in the column post_content within the table xyz_posts, not change any other table data. The post doesn't pull in the original URLs from a table, they are user generated internal links. – Sara44 Feb 16 '20 at 11:54
  • Edited to clarify things further. Unfortunately not all the data is in xyz_posts. We have to query xyz_converter to get the correct mapping before coming back to get the postname. – Sara44 Feb 16 '20 at 20:21
  • I managed to get rid of all the .0 references. I tried the first part of the temp table code (up to and including the first INNER JOIN line) just to see if it would create the table and it said "MySQL returned an empty result set (i.e. zero rows)". My syntax seems good, any ideas? I'll stick with this, thanks so much for the help so far. Sometimes need a few days to get back to you but I will! Incidentally, I need the trailing slash on the final URL. – Sara44 Feb 18 '20 at 20:25
  • @Sara44 `CREATE TEMPORARY TABLE` just creates a temporary table that exists only for that session. So every other query you do must be in the same session (in phpmyadmin you have to copy paste all the commands in the same window). Alternatively you can ommit the keyword TEMPORARY and just create a normal table which you can then DROP after you are done. – George Pant Feb 18 '20 at 21:12
  • I eventually managed to run this but with >8000 rows in post_content it took an hour to complete, crashing the server on the way. There were some malformed URLs too which I need to investigate. Is there a more efficient way where I can replace x amount of lines at a time? – Sara44 Feb 19 '20 at 20:22
  • @Sara44 Try adding "WHERE p.id>0 AND p.id<=100" to the end of the query That should change the first 100 lines only Then you can change the numbers.Also maybe the temporary table is the problem Have you tried using a normal table? – George Pant Feb 19 '20 at 23:13
  • I am using a normal table in which data looks fine. My problem is that I get https://example.com/index.php?topic=7949 changed to https://example.com/race/49 when the new table explicitly says the correct URL which is nothing like that. I don't know where this 'race' is coming from. Some URLs change correctly, some don't change at all. It may help to see that https://example.com/index.php?topic=7962 becomes https://example.com/race/62 . Does the =79 mean anything or need escaping better? – Sara44 Mar 17 '20 at 16:04
  • I read that 'race' condition might mean I need to lock the table somehow in the SQL query. For info there's only me using the table currently. – Sara44 Mar 17 '20 at 16:15
  • @Sara44 I updated the answer The problem is that example.com/index.php?topic=7962 and example.com/index.php?topic=79 both probably exist.And the query replaced index.php?topic=79 with 'race' (probably the title of the post with this id) before replacing index.php?topic=7962.The solution would be to first update the urls with the biggest id so just change the order in the table used for replacement by adding `ORDER BY meta_value DESC`. – George Pant Mar 17 '20 at 17:43
  • Ah, that makes sense and yes, the topic name was 'race' for number 79. I tried this query on just one post with multiple topics with 79 in them and it still happened though. I first ordered my table as you suggested with a single line SQL query (incidentally it's ORDER BY find_value DESC), then ran the update. Could it be that the code is running multiple threads too quickly or should it wait for the last one to finish before writing the next? – Sara44 Mar 17 '20 at 20:09
  • @Sara44 I used `ORDER by meta_value DESC` because in my queries the temporary's table find_value is created from meta_value.In any case you should first create an ordered table (mine is temp_tbl ) and run the update statement in that table In your ordered table the find_value index.php?topic=7949 must appear in order before index.php?topic=79. When your ordered table is created and ready you should then run the UPDATE statement in that table like `UPDATE xyz_posts p INNER JOIN temp_tbl ...` – George Pant Mar 17 '20 at 20:24
  • Yes, I am doing that and can see the temp_tbl is correctly sorted. It's just messing up on the posts with multiple lines where, for example, that 79 is overwriting 7949 first. It's like the ORDER BY doesn't matter during the second query where we are doing the INNER JOIN? – Sara44 Mar 17 '20 at 21:12
  • @Sara44 If the table is correctly sorted (big numbers first) then no wrong substitue should take place But unfortunately only the first found link will be replaced (all occurences).So you have to run the `UPDATE` query multiple consecutive times.The first time you run it only articles with one type of link will be corrected.Second time all articles with no more than two different type of links will be corrected etc.I dont know if it is possible to replace all links with a single query like this .I ll try to work it out when I have more time... – George Pant Mar 18 '20 at 01:05
  • Thanks. I have started to run this and it works but with thousands of posts it can't be a manual thing. If I scan 1000 posts at a time it takes 90 seconds and replaces one URL :) Looks like I have 7675 URL rewrites to do in total too! If I could find the posts where there are URL rewrites to do first then just run the queries against those it may help. – Sara44 Mar 19 '20 at 19:32
  • I have accepted your solution George. You really went above and beyond with the excellent help and code. It got me there and taught me a lot along the way. Thank you so much, top stuff! – Sara44 Mar 28 '20 at 10:42
  • @Sara44 Happy I could help you :) – George Pant Mar 28 '20 at 11:31