0

I recently migrated old data from a news site into a new platform built using Ruby on Rails. Mostly seamless migration except for one thing that I have not been able to figure out.

Inside the body of the articles there are 'a tags'. These urls do not have an http or https on them. they start with www.

<a href="www.exampleurl.com">www.exampleurl.com</a>

I know that if I were to prepend these with the 'http://' they would go to the correct place. As they are now however, they generate this in the view:

localhost:3000/articles/www.exampleurl.com

It would not be practical to go back through 150,000 articles to check the body and see if they have an 'a tag' without the 'http://'. Is there a way to change this functionality in Rails so that when we display the body using

article.body.html_safe

it displays the body with the absolute url in the middle of it rather than the relative url? If one exists of course.

Aaron Wortham
  • 175
  • 1
  • 1
  • 10
  • Are you certain that Rails is converting those `href`s to `localhost:3000/articles/www.exampleurl.com`? By "certain" I mean that you've looked at the generated HTML text and verified that you really end up with `` – mu is too short Oct 03 '16 at 20:16
  • @muistooshort I don't think rails is rendering that. That must be the browser resolving the relative url. – Alexandre Angelim Oct 03 '16 at 20:19
  • I don't think you'll find something out of the box to deal with this. Consider sanitizing your posts in the background. You can parse that content with Nokogiri and fix it up when needed. I'm assuming new content won't have this problem anymore. – Alexandre Angelim Oct 03 '16 at 20:22
  • I inspected the html and the generated html is an 'a tag' pointed at the correct destination. However when you hover over it, the url at the bottom is a relative url. So you are probably right, it is the browser resolving it to be a relative rather than Rails. – Aaron Wortham Oct 03 '16 at 20:32
  • @AlexandreAngelim Sometimes I know the answer to a question when I ask it, debugging is a necessary skill ;) – mu is too short Oct 03 '16 at 21:01
  • Now that you know where the problem is, I'd recommend whipping up a data cleaning tool using Nokogiri and Addressable to clean up your existing 150k documents and fix up an new data before it gets to the database. – mu is too short Oct 03 '16 at 21:05
  • @muistooshort I should have picked that up. ;) - The good thing is that Aaron still went ahead and checked. – Alexandre Angelim Oct 03 '16 at 21:07

0 Answers0