cheerio sometimes returns empty string

Question

I'm scraping Genius.com for lyrics; I've googled and can't seem to find a reason for why my code isn't working. I am scraping the text from the div on a Genius.org page (i.e., https://genius.com/Britney-spears-baby-one-more-time-lyrics).

Viewing the page source, it appears the div exists and is populated with text in the source and not by Javascript or otherwise (if it was, wouldn't cheerio work zero percent of the time in this context?) When I run my code, it works 50% of the time; other times it returns an empty.

I saw this but this seems like a hack-ey solution and I don't really see why my async/await isn't working for the full response from phin...

Here's the code in question

const scraperRouter = require('express').Router()
const p = require('phin')
const cheerio = require('cheerio')

scraperRouter.get('/', async (req, res) => {
    
        const url = req.header('geniusUrl')
    
        const _res = await p(url)
        
        try {
            let $ = cheerio.load(_res.body)
            const lyrics = $('.lyrics').text()
    
            res.send(lyrics)
        }
        catch (e) {
            console.log(e)
            res.json(e)
        }
    })

Any advice appreciated. Thanks.

I don't see any element with the class `lyrics`. Try partially matching the varying class they are using like this: `$('[class^=Lyrics__Container]').text()` (It will match when the class attribute starts with this string) — blex, Jul 08 '21 at 21:18
Sometimes this happens when sites are A/B testing. They might redirect you to one of a couple DOMs. There might also be regional differences. I recommend trying to access it from a couple different IPs, browsers, regions, etc to try to figure out if there's a pattern. If you can narrow it down to a couple of different DOMs, then you can conditionally try both. — ggorlen, Jul 08 '21 at 22:17
Thanks; there are two different DOMs being served. Both of your responses fixed the problem for me. Thanks a ton. — kcrwf72, Jul 08 '21 at 22:36

score 1 · Answer 1 · answered Jan 01 '23 at 02:43

Converting my comment to an answer after OP confirmed it as the solution:

Sometimes this happens when sites are A/B testing. They might redirect you to one of a couple DOMs. There might also be regional differences. I recommend trying to access it from a couple different IPs, browsers, regions, etc to try to figure out if there's a pattern. If you can narrow it down to a couple of different DOMs, then you can conditionally try both.

This can also occur due to rate limiting.

cheerio sometimes returns empty string

1 Answers1