0

I've been trying to move from postgres to tokio_postgres but struggle with some async.

use scraper::Html;
use std::sync::Arc;
use tokio::sync::Mutex;
use tokio::task;

struct Url {}
impl Url {
    fn scrapped_home(&self, symbol: String) -> Html {
        let url = format!(
            "https://finance.yahoo.com/quote/{}?p={}&.tsrc=fin-srch", symbol, symbol
        );
        
        let response = reqwest::blocking::get(url).unwrap().text().unwrap();

        scraper::Html::parse_document(&response)
    }
}

#[derive(Clone)]
struct StockData {
    symbol: String,
}

#[tokio::main]
async fn main() {
    let stock_data = StockData { symbol: "".to_string() };
    let url = Url {};
    
    let mut uri_test: Arc<Mutex<Html>> = Arc::new(Mutex::from(url.scrapped_home(stock_data.clone().symbol)));
    let mut uri_test_closure = Arc::clone(&uri_test);

    let uri = task::spawn_blocking(|| {
        uri_test_closure.lock()
    });
}

Without putting a mutex on

url.scrapped_home(stock_data.clone().symbol)),

I would get the error that a runtime cannot drop in a context where blocking is not allowed, so I put in inside spawn_blocking. Then I get the error that Cell cannot be shared between threads safely. This, from what I could gather, is because Cell isn'it Sync. I then wrapped in within a Mutex. This on the other hand throws Cell cannot be shared between threads safely'.

Now, is that because it contains a reference to a Cell and therefore isn't memory-safe? If so, would I need to implement Sync for Html? And how?

Html is from the scraper crate.

UPDATE:

Sorry, here's the error.

error: future cannot be sent between threads safely
   --> src/database/queries.rs:141:40
    |
141 |           let uri = task::spawn_blocking(|| {
    |  ________________________________________^
142 | |             uri_test_closure.lock()
143 | |         });
    | |_________^ future is not `Send`
    |
    = help: within `tendril::tendril::NonAtomic`, the trait `Sync` is not implemented for `Cell<usize>`
note: required by a bound in `spawn_blocking`
   --> /home/a/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.20.1/src/task/blocking.rs:195:12
    |
195 |         R: Send + 'static,
    |            ^^^^ required by this bound in `spawn_blocking`

UPDATE:

Adding Cargo.toml as requested:

[package]
name = "reprod"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
reqwest = { version = "0.11", features = ["json", "blocking"] }
tokio = { version = "1", features = ["full"] }
tokio-postgres = "0"
scraper = "0.12.0"

UPDATE: Added original sync code:

fn main() {
    let stock_data = StockData { symbol: "".to_string() };
    let url = Url {};
    
    url.scrapped_home(stock_data.clone().symbol);
}

UPDATE: Thanks to Kevin I was able to get it to work. As he pointed out Html was neither Send nor Sync. This part of the Rust lang doc helped me to understand how message passing works.

pub fn scrapped_home(&self, symbol: String) -> Html {
        let (tx, rx) = mpsc::channel();

        let url = format!(
            "https://finance.yahoo.com/quote/{}?p={}&.tsrc=fin-srch", symbol, symbol
        );

        thread::spawn(move || {
            let val = reqwest::blocking::get(url).unwrap().text().unwrap();
            tx.send(val).unwrap();
        });
        
        scraper::Html::parse_document(&rx.recv().unwrap())
    }

Afterwards I had some sort of epiphany and got it to work with tokio, without message passing, as well

pub async fn scrapped_home(&self, symbol: String) -> Html {
            let url = format!(
                "https://finance.yahoo.com/quote/{}?p={}&.tsrc=fin-srch", symbol, symbol
            );

            let response = task::spawn_blocking(move || {
                reqwest::blocking::get(url).unwrap().text().unwrap()
            }).await.unwrap();
            
            scraper::Html::parse_document(&response)
        }

I hope that this might help someone.

a-gradina
  • 43
  • 5
  • Please post the original code that gave your error, along with the actual error message. You are very close to an [XY Problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem) here, and I think people will be able to help best if you give the full context, rather than your summary of it please. – Kevin Anderson Oct 26 '22 at 19:06
  • Mhm, this is the original code. The actual error is: future is not `Send`. within `tendril::tendril::NonAtomic`, the trait `Sync` is not implemented for `Cell` – a-gradina Oct 26 '22 at 19:24
  • What do you mean by "some error"? The actual text of the error would be helpful. – Peter Hall Oct 26 '22 at 19:25
  • What is `Html`? Please provide **reproducible** code, including Cargo.toml and all code necessary to trigger the error. – Chayim Friedman Oct 26 '22 at 19:35
  • That is helpful no doubt, but I meant "before you wrapped it". I think your problem is that `Html` is not `Send`, and the rest is you trying to "fix" that by wrapping it in a mutex and such, and digging deeper. I'm with @ChayimFriedman that we need reproducible code, not the fragment. – Kevin Anderson Oct 26 '22 at 19:37
  • 1
    @a-gradina They didn't mean *the link to your full project*, I think they meant a [MRE]. – Finomnis Oct 26 '22 at 20:04
  • @Finomnis Thanks! I changed the code above to include the Cargo.toml file as well as a reproducible example. – a-gradina Oct 27 '22 at 02:37
  • I'm glad you got your code working @a-gradina. As we isolated, it's the point of creation of the `Html` object where you can no longer move it. Before that with text, you can absolutely move a `String` between threads, but not the `Html` one. – Kevin Anderson Oct 28 '22 at 12:55

1 Answers1

1

This illustrates it a bit more clearly now: you're trying to return a tokio::sync::MutexGuard across a thread boundary. When you call this:

    let mut uri_test: Arc<Mutex<Html>> = Arc::new(Mutex::from(url.scrapped_home(stock_data.clone().symbol)));
    let mut uri_test_closure = Arc::clone(&uri_test);

    let uri = task::spawn_blocking(|| {
        uri_test_closure.lock()
    });

The uri_test_closure.lock() call (tokio::sync::Mutex::lock()) doesn't have a semicolon, which means it's returning the object that's the result of the call. But you can't return a MutexGuard across a thread boundary.

I suggest you read up on the linked lock() call, as well as blocking_lock() and such there.

I'm not certain of the point of your call to task::spawn_blocking here. If you're trying to illustrate a use case for something, that's not coming across.

Edit:

The problem is deeper. Html is both !Send and !Sync which means you can't even wrap it up in an Arc<Mutex<Html>> or Arc<Mutex<Optional<Html>>> or whatever. You need to get the data from another thread in another way, and not as that "whole" object. See this post on the rust user forum for more detailed information. But whatever you're wrapping must be Send and that struct is explicitly not.

So if a type is Send and !Sync, you can wrap in a Mutex and an Arc. But if it's !Send, you're hooped, and need to use message passing, or other synchronization mechanisms.

Kevin Anderson
  • 6,850
  • 4
  • 32
  • 54
  • I try to make it more clearly. The original code is in my edit. All I want is the `Html` from `url.scrapped_home(stock_data.clone().symbol)`. The `uri_test_closure.lock()`call doesn't have a semicolon because I want that `Html`object. I made the call to `task::spawn_blocking` because calling `uri_test_closure` or `url.scrapped_home(stock_data.clone().symbol)` alone panics the thread. `Cannot drop a runtime where blocking is not allowed.` – a-gradina Oct 27 '22 at 15:30
  • A couple of things. [`reqwest::blocking`](https://docs.rs/reqwest/latest/reqwest/blocking/) is very explicit that it shouldn't be done in an async context, and `Html` is NOT `Send` or `Sync`. So what you have to pass around is the `Arc` (cloned), and not the `lock()`'d object, or the `Html` object itself. That's the core of your issue: once you have the Html object, you need to lock it before using" it. But you can't pass the `Mutex` or the `LockGuard` between threads. You have to pass a `clone()` of the Arc, and do the "lock dance" all the time. – Kevin Anderson Oct 27 '22 at 16:48
  • I'm only doing it because I want to learn async, and I thought that moving my starting Rust project to tokio_postgres would be a good start. If I understand you correctly, I'd have put `uri_test_closure` within the `spawn_blocking` without `lock()`. Is that correct? Because `uri_test_closure` is already `Arc` "cloned". I'm lost afterwards, because I don't know how to get the `Html` object in the first place without running into an error. Thank you for bearing with me! – a-gradina Oct 27 '22 at 17:16
  • Updated my answer. Short answer: the problem is the `Html` struct. It's strictly one-thread-only. More info above. – Kevin Anderson Oct 28 '22 at 02:16
  • That did it! Message passing! I got it to work. I'll update my post above with the working code after work. Thank you so very much for the help and having patience with me! I'll make sure to be more thorough and improve in my question when I post another question on SO. – a-gradina Oct 28 '22 at 11:23