0

I'm using Reqwest in my web crawler, and I'm trying to add tests for the main loop to ensure I get correct output. I'm attempting to use Iron for a fake HTTP server, with preset responses. However, in my main_loop function, let client = Client::new(); is hanging.

The first few lines of _main_loop:

fn _main_loop(starton: String, panic: bool) {
    //panic("test") // calls as normal
    let client = Client::new(); // <-- problem!
    panic("test") // doesn't call
    let mut future_urls: Vec<String>;
    // ...
}

My test mod:

#[cfg(test)]
mod tests {
    use iron::{Iron, IronResult, Headers};
    use iron::response::Response;
    use iron::request::Request;
    use iron::status;
    use iron::middleware::Chain;
    use iron::headers::ContentType;
    use iron::mime::{Mime, TopLevel, SubLevel};
    use iron::typemap::TypeMap;
    use std;

    use *;

    #[test]
    fn __main_loop() {
        fn handler(req: &mut Request) -> IronResult<Response> {
            let mut mime = Headers::new();
            mime.set(ContentType(Mime(TopLevel::Text, SubLevel::Html, Vec::new())));

            Ok(Response {
                headers: mime,
                status: Some(status::Ok),
                body: Some(Box::new(match req.url.path().join("/").as_str() {
                "" => "<a href='file'></a><a href='file1'></a>",
                "file" => "<a href='/file1'></a>",
                "file1" => "<a href='/file'></a>",
                _ => "not found"
                })),
                extensions: TypeMap::new()
            })
        }

        let child = std::thread::spawn(|| Iron::new(Chain::new(handler)).http("localhost:9999").unwrap());

        let f: Vec<String> = Vec::new();
        assert_eq!(_main_loop("http://localhost:9999/".to_string(), false), f);
    }
}

Terminal output:

$ cargo test
   Compiling crawler v1.0.0 (file:///home/*******/crawler)
warning: unreachable statement
  --> src/main.rs:82:5
   |
82 |     let mut future_urls: Vec<String>;
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
   = note: #[warn(unreachable_code)] on by default

warning: unused variable: `client`
  --> src/main.rs:80:9
   |
80 |     let client = Client::new();
   |         ^^^^^^ help: consider using `_client` instead
   |
   = note: #[warn(unused_variables)] on by default

warning: unused variable: `starton`
  --> src/main.rs:79:15
   |
79 | fn _main_loop(starton: String, _panic: bool) -> Vec<String> {
   |               ^^^^^^^ help: consider using `_starton` instead

warning: unused variable: `child`
   --> src/main.rs:239:13
    |
239 |         let child = std::thread::spawn(|| Iron::new(Chain::new(handler)).http("localhost:9999").unwrap());
    |             ^^^^^ help: consider using `_child` instead

    Finished dev [unoptimized + debuginfo] target(s) in 4.24 secs
     Running target/debug/deps/crawler-9c5de394eb85849d

running 9 tests
test html::tests::_get_attribute_for_elem ... ok
test html::tests::_html_token_sink ... ok
test url_utils::tests::_add_url_to_vec ... ok
test url_utils::tests::_get_root_domain ... ok
test html::tests::_find_urls_in_html ... ok
test url_utils::tests::_check_if_is_in_url_list ... ok
test url_utils::tests::_remove_get_params ... ok
test url_utils::tests::_repair_suggested_url ... ok

When running the test single threaded (-- --test-threads 1), it ends with test tests::__main_loop ....

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
thatlittlegit
  • 69
  • 1
  • 8

1 Answers1

0

I think the problem is that the thread you spawn does not terminate and the test ends up waiting for it to terminate (I'm not sure why this happens and it does not end up being killed by the test runner after a while), i.e. not reqwest is the issue but your iron server.

Now I'm not sure how to fix this, in fact Listening::close seems to be broken since iron 0.6.0 still depends on hyper 0.10 which does not have that functionality anymore.

In the worst case you can implement a server using hyper directly, as I did here. Maybe there is an actual quick solution to your original code, I don't know.

Usually if you can find a way around running a server in your tests this would be the ideal case in my opinion. With the crate I linked before (reqwest-mock, disclaimer I'm the author) you could write your _main_loop function to take a GenericClient as an argument, and then mock the request in your tests using StubClient, and use the direct client in the production code. But maybe you don't even need that and should aim to design your crawler in a way so that you can test the different functions as independently of each other as possible.

evotopid
  • 5,288
  • 2
  • 26
  • 41