-1

I have a txt file (links.txt) There are thousands of links in it

I want to sort all the links using the following code

<?php
    function get_domain($url)
    {
        $pieces = parse_url($url);
        $domain = isset($pieces['host']) ? $pieces['host'] : $pieces['path'];
        if (preg_match('/(?P<domain>[a-z0-9][a-z0-9\-]{1,63}\.[a-z\.]{2,6})$/i', $domain, $regs)) {
            return $regs['domain'];
        }
            return false;
        }
        print get_domain("http://mail.somedomain.co.uk"); // outputs 'somedomain.co.uk'
?>

How to call file 1 and arrange them and save them again?

Update

In my file (domains.txt) there are about 10,000 domains I want to filter domains with the above code

for example:

Before:

http://www.example.com/about
www.example.net/index.php
http://subdomain.example.org/
http://www.example.co/page-1
http://www.example.co.uk

After:

example.com
example.net
example.org
example.co
example.co.uk
LeonardChallis
  • 7,759
  • 6
  • 45
  • 76
seyedrezabazyar
  • 89
  • 1
  • 11

1 Answers1

1

In theory it's as simple as:

$file = file('domains.txt');
for ($x=0;$x<count($file);$x++) {
    $file[$x] = get_domain($file[$x]);
}
sort($file);
file_put_contents('domains.txt', $file);

But, depending on the size of your domains file this may be slow and/or take up a lot of resources, possibly even crash. You don't mention whether this is a one-off or something that would happen often, but if this is an issue then other solutions would include:

  • Saving into a database, as suggested by @Karlo Kokkak (one example on SO here)
  • Use the command line, if you have access. If this were the case you'd probably be better skipping PHP altogether and using command line functions

Note: if you do go for the PHP above, you may need to look into increasing PHP's time limit in that script.

LeonardChallis
  • 7,759
  • 6
  • 45
  • 76