Questions tagged [web-mining]

Web mining is the application of data mining techniques to discover patterns from the World Wide Web.

Web mining is the application of data mining techniques to discover patterns from the World Wide Web. Web mining can be divided into three different types:

  1. Web usage mining;
  2. Web content mining;
  3. Web structure mining.
42 questions
0
votes
1 answer

Problems text mining using the ‘rJava’ and ‘tm.plugin.webmining’ packages

I apologize if my formatting of this topic isn’t correct this is my first time posting in the community and I will try and do my best.I have been working on this problem for awhile but have been struggling to address it. I am currently following the…
0
votes
0 answers

Webscrape a webpage that's url same but location changes in search box in c#

I have to scrape a webpage in c# and i am using Httpclient,My problem is that when i scrape a url e.g https://somethng.com/search/?query=mobile,its give me result no products found ,but when i search manually in website and give a location in search…
0
votes
1 answer

How would I go about getting info from a list of link then dump them into a JSON object?

New to Python and BeautifulSoup. Any help is highly appreciated I have an idea of how to build one list of a companies info, but that's after clicking on one link. import requests from bs4 import BeautifulSoup url =…
Vash
  • 141
  • 11
0
votes
2 answers

How could I use graph mining method to get a multi-node graph?

I now use apriori algorithm to do a data mining project,and I get result such as:item1 <=> iteam2、item2 <=> item3....... I want use graph mining to generate a graph containing many nodes and illustrating relation between these node like this: I…
leafonsword
  • 2,735
  • 3
  • 16
  • 20
0
votes
1 answer

How to remove non-English words using RapidMiner

I am performing text mining in RapidMiner. I am crawling a website and doing some pre-processing tasks like tokenizing, lowercasing and filtering English stopwords; but still I am getting some nonsense words like "xckxzaz", "xkaffqoxzomd" or…
0
votes
1 answer

WEKA simple CLI command Killed

I Run following code on WEKA SimpleCLI tool java weka.core.converters.TextDirectoryLoader -dir c:/mydir/ > c:/output/result.arff and it showed following result [...Killed] Finished redirecting output to 'c:/output/result.arff' the result.arff file…
0
votes
1 answer

Establish a session to call URL with Perl

I am trying to mine data from a webpage with the WWW::Mechanize perl module. However, I first need to establish a connection so that this webpage will allow me to access the data. In a browser, I can establish this connection by clicking a…
0
votes
2 answers

How to measure semantic relationship between two webpages

Let's assume, I am visiting a University webpage. There are many teacher profile there. Though these pages are not syntactically related, these are semantically related. How can I measure this type of relationship? Actually on which parameter should…
0
votes
1 answer

How to install Boilerpipe on Windows?

Can anyone tell me how to use boilerpipe on windows with Netbeans ? I'll be grateful if you can give me some java code to start with it.
dark_shadow
  • 3,503
  • 11
  • 56
  • 81
-1
votes
2 answers

Classification using text mining - by values versus keywords

I have a classification problem that is highly correlated to economics by city. I have unstructured data in free text such as population, median income, employment, etc. Is it possible to use text mining to understand the values in the text and…
-2
votes
1 answer

Dataset for URL normalization

I'm working on a project for normalizing URL's.(i.e different URL's that map to the same web page should be identified and redundancy should be reduced as like a search engine). So I'd like a dataset containing different URL's in order to test my…
-2
votes
1 answer

Virus/Malware Danger While Web Crawling

I recently wrote a custom web crawler/spider using Java and the JSoup (http://jsoup.org/) HTML parser. The web crawler is very rudimentary - it uses the Jsoup connect and get methods to get the source of pages and then other JSoup methods to parse…
1 2
3