0

I have trouble with getting this script to accept e.g. https://youtu.be/HPP0yB-_blA, https://www.youtube.com/watch?v=HPP0yB-_blA works though. The first example just leads to invalid command name "".

# URL title parse script for Eggdrop.
#
# Based on https://github.com/teeli/urltitle by teel.
#
# Version log:
#   0.11    Minor site specific tweaks.
#   0.1     First version.
#
# Usage:
#   .chanset #channelname +urltitle  ;# Enable script.

namespace eval urltitle {
    # Configuration variables.
    set delay 1  ;# Minimum number of seconds to wait between uses.
    set length 5  ;# Minimum character length of URL to trigger usage.
    set timeout 5000  ;# Geturl timeout in milliseconds (1/1000ths of a second).

    # Internal variables.
    set ignoredSites {apina.biz}  ;# Sites to ignore when parsing URLs.
    set last 1  ;# Stores time of last usage.
    set scriptVersion 0.11  ;# Script version number.

    # Binds/Hooks.
    bind pubm - "*://*" urltitle::handler
    setudef flag urltitle  ;# Channel flag to enable script.

    # Required packages.
    package require http
    package require tdom
    package require tls

    proc socket {args} {
        set opts [lrange $args 0 end-2]
        set host [lindex $args end-1]
        set port [lindex $args end]

        ::tls::socket -autoservername true {*}$opts $host $port
    }

    proc handler {nick host user chan text} {
        set time [clock seconds]
        variable delay
        variable ignoredSites
        variable last
        variable length

        if {[channel get $chan urltitle] && ($time - $delay) > $last} {
            foreach word [split $text] {
                if {[string length $word] >= $length && [regexp {^(f|ht)tp(s|)://} $word] && \
                    ![regexp {://([^/:]*:([^/]*@|\d+(/|$))|.*/\.)} $word]} {
                    foreach site $ignoredSites {
                        if {![string match *$site* $word]} {
                            set last $time

                            # Enable HTTPS support.
                            ::http::register https 443 [list urltitle::socket]
                            set title [urltitle::parse $word]

                            # Disable HTTPS support.
                            ::http::unregister https

                            # Sends text to the server, like 'putserv', but it uses a different queue intended for sending messages to channels or people.
                            puthelp "PRIVMSG $chan :$title"

                            break
                        }
                    }
                }
            }
        }
        return 1
    }

    proc parse {url} {
        set title ""
        variable timeout

        if {[info exists url] && [string length $url]} {
            if {[catch {set http [::http::geturl $url -timeout $timeout]} results]} {
                putlog "Connection to $url failed"
            } else {
                if {[::http::status $http] == "ok" } {
                    set data [::http::data $http]

                    if {[catch {set doc [dom parse -html -simple $data]} results]} {
                        # Remove HTML comments.
                        regsub -all {<!--.*?-->} $data {} data

                        # Remove everything except <head></head> content.
                        regexp -nocase {<head>.*?</head>} $data match
                        #regsub -nocase {.*?<head>} $data {} data
                        #regsub -nocase {</head>.*?} $data {} data

                        regexp -nocase {<title>(.*?)</title>} $data match title
                        #set title [regsub -all -nocase {\s+} $title " "]
                        set title [string trim $title]
                    } else {
                        set root [$doc documentElement]
                        set title [string trim [[$root selectNodes {//head/title[1]/text()}] data]]
                        $doc delete
                    }
                } else {
                    putlog "Connection to $url failed"
                }
                http::cleanup $http
            }
        }
        return $title
    }
    putlog "URL title parser v$scriptVersion"
}

Does anyone why this happens? I think the problem is set title [urltitle::parse $word] but I can't get it right.

1 Answers1

1

The problem is formally in code you've not shown, urltitle::parse, since your patterns correctly match both URLs. A good way to determine if that is actually true is to just try running little bits of code in an interactive shell.

I'm guessing that the actual problem is that the youtu.be URL generates an HTTP redirect to the other URL (or one very much like it); Tcl's http library doesn't process redirects for you — it'd be a higher-level layer on top (and if this is the source to the urltitle code then I can see that it isn't doing it) — and the result causes something to choke in a nasty way.

If you're just wanting to support these youtu.be urls, you can the rewrite yourself with regsub immediately before passing the URL into urltitle::parse:

    ...
    regsub {^https?//youtu\.be/([^?/]*)$} $word {https://www.youtube.com/watch?\1} word
    set title [urltitle::parse $word]
    ...

That regsub is carefully guarded so it won't transform anything it shouldn't, but this approach isn't scalable; you can't introduce your own rewrite rule for every website out there! Instead, it needs to handle the various redirects correctly for you. That's an actual bug in the urltitle code.

Donal Fellows
  • 133,037
  • 18
  • 149
  • 215
  • This looks promising, I'll check it out tomorrow. – Antti Keränen Jun 16 '17 at 19:02
  • Yes, you're correct about introducing rewrite rules for every website. I started researching TCL HTTP redirection and found some code originally made by you at http://wiki.tcl.tk/11831. It's working great now. Thanks for the help! – Antti Keränen Jun 17 '17 at 06:56