How to get the title of a web page using javascript, without getting blocked by CORS policies?

Question

I'm developing plugin for Obsidian that expands shortened urls (e.g. bit.ly, t.co, etc.) to their more descriptive long version in Markdown. To create a proper Markdown link [title](web link), I need to read the title of the web page.

However, I keep encountering a CORS issue that prevents me from fetching the title. I've looked at Cors Proxy solutions, but the free ones appear to be either non durable, unsafe or for demo purposes only.

My code is open source and available here: https://github.com/odebroqueville/obsidian-url-expander

The code specifically fetching the web page title is:

// Helper function to get the title of a web page
export async function getTitle(url:string){
    try {
        const request = new Request(url, {
            method: 'GET',
            mode: 'cors',
            headers: {
                'Content-Type': 'text/html'
            }
        });
        const response = await fetch(request);
        const html = await response.text();
        let title = '';
        const titleMatches:string[] = html.match(/<title.*?>.*?<\/title>/gmi)||[];
        if (titleMatches.length > 0) {
            title = titleMatches[0];
            console.log(title);
        }
        if (title.search(/<title/gi) !== -1){
            const titleText = title.substring(title.indexOf('>')+1);
            const res = titleText.replace('</title>','');
            console.log(res);
            return res;
        }
        return '';
    } catch (err) {
        console.error(`Failed to retrieve title with error: ${err}`);
        return '';
    }
}

I tried using different proxies, but either the code was deprecated and unsafe or the plans were paid for.

Your goal is to write a small API that you can give a url, opens it, gets the title and return some information back to the client. Instead of looking for generic solutions like `cors-anywhere` it might be simpler just writing this small service — Evert, Oct 26 '22 at 20:06

score -1 · Answer 1 · answered Oct 26 '22 at 19:54

I suppose you're trying to read arbitrary web pages using a browser and attempting the fetch from the browser. This won't work because the responding server is out of your control. You could instead do this server-sided, or you could maybe use a hidden <frame> that loads the page and then use JavaScript to read the loaded page's title.

How to get the title of a web page using javascript, without getting blocked by CORS policies?

1 Answers1