Highest Voted 'websphinx' Questions

10

votes

6 answers

How to crawl entire Wikipedia?

I've tried WebSphinx application. I realize if I put wikipedia.org as the starting URL, it will not crawl further. Hence, how to actually crawl the entire Wikipedia? Can anyone gimme some guidelines? Do I need to specifically go and find those URLs…

asked Feb 22 '10 at 20:01

Mr CooL

1,529
8
23
27

1

vote

0 answers

Use Java to Crawl and download entire website overriding the HttpsURLConnection

I am looking to crawl the entire website and save it locally offline. It should have 2 parts: Authentication This needs to be implemented using Java and I need to override HttpsURLConnection logic to add couple lines of authentication (Hadoop) in…

web-crawler nutch crawler4j websphinx

asked Jan 19 '17 at 22:18

Spartan

11
2

0

votes

1 answer

How to do form authentication by entering username and password while web crawler is crawling pages

I have downloaded websphinx to do this but i need it to ask me username and password of website and then submit the username and password to the website and once authenticated it should start crawling the internal links and sublinks and save the…

java file-io web-crawler websphinx

asked Dec 13 '11 at 09:43

saum22

884
12
28

-2

votes

1 answer

Regex Working on the test program but not on WebSprinx crwaler

Here is my code for Regex matching which worked for a webpage: public class RegexTestHarness { public static void main(String[] args) { File aFile = new File("/home/darshan/Desktop/test.txt"); FileInputStream inFile = null; …

java html regex websphinx

asked Sep 07 '11 at 18:56

darshan

1,230
1
11
17

Questions tagged [websphinx]

How to crawl entire Wikipedia?

Use Java to Crawl and download entire website overriding the HttpsURLConnection

How to do form authentication by entering username and password while web crawler is crawling pages

Regex Working on the test program but not on WebSprinx crwaler