0

I'm building a webcrawler and it has 2 main feature wich are both executed as threads : -The fetcher (crawl a website and separate links from files store both of them into the database). -The downloader (download files based on their url returned by the fetcher).

I've an object WebSite wich include everything I want to know about a website. Now I want to manipulate my database to change the status of a link from waiting to fetching then to fetched. The same goes for files from waiting to downloading then to downloaded.

To prevent a Fetcher to fetch a link that has been chosen by another fetcher I've done this function inside my WebSite object :

public synchronized String[] getNextLink(){
    //Return the next link from database that has visited set to 0 then change it to -1 to say that it's in-use.
}

And I've done the same for my Downloaders with this function :

public synchronized String getNextFile(){
    //Return the next file from database that has downloaded set to 0 then change it to -1 to say that it's downloading
}

Both method are inside my WebSite object since if 2 Fetchers are working with different websites they cannot Select the same row inside my database (same goes for downloaders). But both function can be called at the same time because Fetchers never select a file and Downloaders never select a link.

Now synchronized is using a single lock (per object) so both of my methods cannot be called at the same time. Is there another keyword to use one lock per method per object ? Or do I need to code it ?

naurel
  • 625
  • 4
  • 18
  • 1
    Why do you even need synchronized? Shouldn't you make sure that every `WebSite` is being processed by a single `Downloader` and a single `Fetcher`? – Kayaman Jan 14 '16 at 13:12
  • I want multiples downloader to use a same Website to download multiples files coming from it at the same time. And same goes to Fetcher to crawl the website faster. – naurel Jan 14 '16 at 14:12

1 Answers1

2

Instead of applying the synchronized keyword to whole methods, which implicitly uses this as a lock-object, you can use two independent lock objects (and any object can be used as a lock-object in Java) within the methods. Each lock object will be independent of others:

private final Object fetcherMutex = new Object();
private final Object downloaderMutex = new Object();

public String[] getNextLink(){
    synchronized (fetcherMutex) { /* ...  */ }
}

public String[] getNextFile(){
    synchronized (downloaderMutex) { /* ...  */ }
}
tucuxi
  • 17,561
  • 2
  • 43
  • 74
  • Since both my methods are inside an object isn't it faster to just create 2 locks inside my object and to deal with them myself than instantiate 2 objects that will have a lock provided by the JVM ? – naurel Jan 14 '16 at 13:21
  • To prevent accidents, you should declare the locks as final ie. final Object fetcherMutex = new Object(); – Palamino Jan 14 '16 at 14:21
  • @Palamino thanks - added `private` & `final` modifiers – tucuxi Jan 14 '16 at 15:11
  • @naurel those two locks would be declared inside your class, just as you suggest. All java objects can be used as locks, regardless of whether you do so or not - you do not pay any extra price to "create an object that will be used as a lock" vs "create a lock". – tucuxi Jan 14 '16 at 15:14