Questions tagged [web-mining]

Web mining is the application of data mining techniques to discover patterns from the World Wide Web.

Web mining is the application of data mining techniques to discover patterns from the World Wide Web. Web mining can be divided into three different types:

  1. Web usage mining;
  2. Web content mining;
  3. Web structure mining.
42 questions
1
vote
1 answer

Why can't I extract the subheading of a page using BeautifulSoup?

I am trying to extract the name and subheading of this page (for example). I have no problem extracting the name, but it's unsuccessful for the subheading. Using inspect element in Chrome, I identified that the subheading text "Canada Census, 1901"…
KubiK888
  • 4,377
  • 14
  • 61
  • 115
1
vote
1 answer

Collecting data from several AJAX pages (using browser add-on?)

I'd like to collect plane ticket prices from a certain website, for many dates and destinations. I can specify source, destination and dates on the URL, but the website fetches the data using AJAX, so the prices aren't readily available on the…
0
votes
0 answers

How to extract main contents excluding advertisements,useless links from a web page?

Possible Duplicate: How to extract textual contents from a web page? I have searched a lot but not abled to find what I'm looking for.Actually I want to extract data from a web page(only main data like an article from a news page).On googling I…
dark_shadow
  • 3,503
  • 11
  • 56
  • 81
0
votes
1 answer

How to extract textual contents from a web page?

I'm developing an application in java which can take textual information from different web pages and will summarize it into one page.For example,suppose I have a news on different web pages like Hindu,Times of India,Statesman,etc.Now my application…
dark_shadow
  • 3,503
  • 11
  • 56
  • 81
0
votes
2 answers

Google provide JSON form query result?

I am doing some web mining tasks using Google. Though using the ordinary Google search engine might help, I still need to analyse the web pages. I want to ask: Does Google provide query results in JSON form? PS: I know one place,Google Custom…
xiaohan2012
  • 9,870
  • 23
  • 67
  • 101
0
votes
3 answers

Search webpage that contain specific links

Suppose I wan to search the web pages that contain the links I want. I would normally use the link as the query and search it(Like in Google) Note here, I just need to pages that contain the link. But normally, the search engine would return results…
xiaohan2012
  • 9,870
  • 23
  • 67
  • 101
0
votes
1 answer

API | Coinimp | user/withdraw | Invalid parameters (POST)

Anyone here using coinimp and have the same problem with me? Have you fixed it? can you help me? So I am trying to test the POST of the user/withdraw, I followed the documentation of it at https://www.coinimp.com/documentation/http-api#user-withdraw…
Mashwishi
  • 177
  • 1
  • 4
  • 16
0
votes
1 answer

POST request issue with httr: desired table not retrieved

Description: trying to retrieve historical data from Investing.com using httr library Original page: https://www.investing.com/rates-bonds/austria-1-year-bond-yield-historical-data Expected output: html table with historical data: sample table…
0
votes
2 answers

Scrape join-dates/user info from a list (csv) of Twitter-users

I'm looking for a solution to a probably quite simple problem and really would appreciate some help or a hint. I have basic knowledge of python and webscraping. I want to explore a certain hashtag and the community behind it on twitter. Using twint…
0
votes
1 answer

Defensive web scraping techniques for scrapy spider

I have been web scraping for about 3 months now, and I have noticed that many of my spiders need to be constantly babysat, because of websites changing. I use scrapy, python, and crawlera to scrape my sites. For example, 2 weeks ago I created a…
pbthehuman
  • 123
  • 3
  • 12
0
votes
1 answer

Apache Nutch index only article pages to Solr

I have setup Nutch 1.17 for crawling few website. As usual, there can be two type of web pages at high level. First those that are category pages or home pages that does not contain the details of any specific story but provide links and short text…
Hafiz Muhammad Shafiq
  • 8,168
  • 12
  • 63
  • 121
0
votes
1 answer

Function not importing from external js file in react

I am migrating a web miner from EJS templates to react. The code below starts the mining process.