0

I have a Rust project that use the "reqwest" crate and has the below function:

fn elem_html_vec(url: &String, element: &str, vec_elem: &mut Vec<String>) {
    let response: String = get(url)
        .expect("Could not load URL.")
        .text()
        .expect("Could not load text");
    let document: scraper::Html = scraper::Html::parse_document(&response);
    let parsed_elements: scraper::Selector = scraper::Selector::parse(element).unwrap();
    for elem in document.select(&parsed_elements) {
        vec_elem.push(elem.inner_html());
    }
}

But when I call the function and the "url" argument ("https://twitter.com/Lions") is a link to a web page that redirects you to another page, I get the below error message.

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: reqwest::Error { kind: Redirect, url: Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("twitter.com")), port: None, path: "/Lions", query: None, fragment: None }, source: TooManyRedirects }', src/main.rs:79:37
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I want it to get the date of the final page it redirects to. How do I do that?

Saad
  • 33
  • 5

2 Answers2

0

https://twitter.com/Lions redirects to https://twitter.com/i/flow/login?redirect_after_login=%2FLions and also sets a cookie using the header set-cookie: guest_id=.... If you do not set the cookie on the redirected request then it redirects to the same location, giving you an infinite loop.

You need to configure reqwest to use cookies, possibly using the cookie_store method. To do this, you'll need to replace the reqwest::get call with a Client created using a ClientBuilder.

Jonathan Giddy
  • 1,435
  • 10
  • 6
  • I changed the first line of the function to "let client = Client::builder().build().unwrap(); let response = client.get(url).send().unwrap().text().unwrap();", but I could not find in the docs how to configure reqwest to use cookies. All the docs mention using Jar which my project does not have. – Saad Jul 18 '23 at 19:21
  • Use `let client = Client::builder().cookie_store(true).build().unwrap()`. You will also need to set the feature `"cookies"` for the `reqwest` crate in `Cargo.toml`. – Jonathan Giddy Jul 19 '23 at 20:56
0

I found a solution by using the "redirect" feature in "reqwest". Below is the code.

fn elem_html_vec(url: &String, element: &str, vec_elem: &mut Vec<String>) {
    let redirect_rules: redirect::Policy =
        reqwest::redirect::Policy::custom(|attempt: redirect::Attempt<'_>| {
            if attempt.previous().len() > 5 {
                attempt.stop()
            } else {
                attempt.follow()
            }
        });
    let client: Client = Client::builder().redirect(redirect_rules).build().unwrap();
    let response: String = client.get(url).send().unwrap().text().unwrap();
    let document: scraper::Html = scraper::Html::parse_document(&response);
    let parsed_elements: scraper::Selector = scraper::Selector::parse(element).unwrap();
    for elem in document.select(&parsed_elements) {
        vec_elem.push(elem.inner_html());
    }
}
Saad
  • 33
  • 5