2

I am writing a web crawler and part of the specifications is that it will crawl the web for a user-specified amount of time. In order to do that I am trying to use the Timer and TimerTask methods. The code I have now is attempt number two. I have watched a few tutorials though none of them are quite what I need. I have also read through the documentation. I have been working on this project for a few weeks now and it is due tonight. I am not sure where to turn to next.

    public void myTimer (String url, long time)
 {
    Timer webTimer = new Timer();
    TimerTask timer;
    timer = new TimerTask()
    {


        public void run()
        {
            long limit = calculateTimer(time);
            while(System.currentTimeMillis() < limit)
            {
                webcrawler crawler1 = new webcrawler();
                crawler1.Crawl(url);
            }
        System.out.println("times Up");
        }
    };
    webTimer.schedule(timer, 1000);

}
  • Do you want to stop the crawler in the middle of a URL? You can't kill a thread (any more) - it's dangerous. You can `interrupt` the `Thread`, but that may or may not work, and likely tickles untested code paths. – Tom Hawtin - tackline Mar 01 '20 at 22:57
  • yes, I need to stop the web crawler after a user-specified amount of time without killing the entire program. It is a part of the project specifications. I was hoping I could use the Timer and TimerTask methods but the above code does not seem to be runing. the print statement never prints out. – Brittany Case Mar 01 '20 at 23:24
  • I may be too late to contribute, but the first solution that comes to my mind (if I understood your issue correctly) is to use an [ExecutorService](https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ExecutorService.html) or something similar. They provide you with the necessary options to stop a process after the given time has passed. – Ukitinu Mar 02 '20 at 05:53
  • You could just set the timeout on your http client. –  Mar 02 '20 at 04:56

1 Answers1

1

I am guessing the .Crawl() is starting a loop and keeping that thread busy, which means it cannot check the while condition. I do not know your implementation of the crawler but i would recommend a function stopCrawling which would set a boolean to true to break the loop inside that class. Than I would do something like this:

public void startCrawler (String url, long time){

    webcrawler crawler1 = new webcrawler();
    crawler1.Crawl(url);

    TimerTask task = new TimerTask() {
        public void run() {
            crawler1.stopCrawling()
        }
    };
    Timer timer = new Timer("Timer");

    timer.schedule(task, time);


}
rempas
  • 123
  • 1
  • 7
  • could I use threads to implement a stop crawler method? If so do you of any thread tutorials? I read through the java docs on threads but it just left me confused – Brittany Case Mar 02 '20 at 02:02
  • 1
    `Timer` has only one execution thread. But `ScheduledThreadPoolExecutor` can be configured with any number of threads. You can also check detailed examples [here](https://www.baeldung.com/java-timer-and-timertask) and [here](https://stackoverflow.com/a/11794588/8370915). – invzbl3 Mar 02 '20 at 02:17
  • okay, so I still don't understand how to associate a method with a thread. If I can figure that out I think I can just have the task interrupt that thread. – Brittany Case Mar 02 '20 at 03:15
  • Methods are not associated with threads. Generally speaking any thread can call any method. Also it's not a great idea to interrupt a thread from a different thread, rather use flags/booleans to control the behaviour. `while(running){doThreadStuff();}` and then from another thread do `running=false;`. see [this](https://stackoverflow.com/questions/11839881/how-to-stop-a-thread-by-another-thread) post – rempas Mar 02 '20 at 12:23