0

I have a MySQL MyISAM table that stores a URL in a url field. The URL is always just the scheme, subdomain, and host ( eg. http://www.site.com or https://site2.edu ).

I need to store the Domain, the Subdomain, and the Scheme in their own fields so I can index them and search them. It's too slow for my system to do a LIKE query on millions of rows.

  • Domain: site.com, site2.com
  • Subdomain: www, ''
  • Scheme: http, https

How do I create a MySQL Trigger to do this? It will need to extract the Domain, Subdomain, and Scheme from the URL field every time it is updated and then store them in their respective fields ( eg. domain, subdomain, scheme ) These fields will only be written to by the MySQL trigger.

I realize parsing URLs is often non-trivial, so I'm mostly concerned with setting up a trigger to do something like this. I can make adjustments and tradeoffs on the quality of URL parsing in my application.

T. Brian Jones
  • 13,002
  • 25
  • 78
  • 117
  • First of all, "*domain*" is not well-defined. There's no easy way to determine where the "domain" ends and the subdomain/host begins (not least because they are semantic distinctions made by the operator of the domain in question and are otherwise indistinguishable to the rest of the world)... the most common approach is to use Mozilla's [Public Suffix List](https://wiki.mozilla.org/Public_Suffix_List) to determine the level at which the public domain registration has been made, then assume anything beneath that is private host/subdomain. But it's not straightforward. – eggyal Jul 15 '13 at 19:15
  • If you are using Amazon's RDS, know that setting up triggers is fairly complicated ( http://stackoverflow.com/questions/8919907/can-i-create-trigger-in-an-rds-db ). It may be easier to put this behavior in your code. – T. Brian Jones Jul 16 '13 at 21:31

1 Answers1

1
DELIMITER //

DROP TRIGGER IF EXISTS url_trigger_before_insert //

CREATE TRIGGER url_trigger_before_insert
 BEFORE INSERT ON url_table
 FOR EACH ROW 
 BEGIN
     DECLARE no_proto CHAR;
     SET NEW.scheme = SUBSTRING_INDEX(NEW.url, ':', 1);

     SET @no_proto = REPLACE(NEW.url,CONCAT(NEW.scheme,'://'),'');

     SET @no_proto = SUBSTRING_INDEX(@no_proto, '/', 1); 

     SET NEW.subdomain = SUBSTRING_INDEX(@no_proto, '.', 1);
     SET NEW.host = SUBSTRING_INDEX(@no_proto, '.', -2);

 END //

DROP TRIGGER IF EXISTS url_trigger_before_update //

CREATE TRIGGER url_trigger_before_update
 BEFORE UPDATE ON url_table
 FOR EACH ROW 
 BEGIN
     DECLARE no_proto CHAR;
     SET NEW.scheme = SUBSTRING_INDEX(NEW.url, ':', 1);

     SET @no_proto = REPLACE(NEW.url,CONCAT(NEW.scheme,'://'),'');

     SET @no_proto = SUBSTRING_INDEX(@no_proto, '/', 1); 

     SET NEW.subdomain = SUBSTRING_INDEX(@no_proto, '.', 1);
     SET NEW.host = SUBSTRING_INDEX(@no_proto, '.', -2);

 END //


DELIMITER ;
Drew
  • 6,311
  • 4
  • 44
  • 44