Questions tagged [phpcrawl]

PHPCrawl is a framework for crawling/spidering websites written in the programming language PHP.

PHPCrawl is a framework for crawling/spidering websites written in the programming language PHP.

28 questions
1
vote
1 answer

How do I remove certain tags and contents with PHP, using PHPCrawler

I am currently using PHPCrawler for some search functionality on a site. I need to remove some of the page elements from being indexed. For example, I have used: $doc_body = preg_replace('/
  • (.*?)<\/li>/is', "", $doc_body); to remove lists,…
  • absentx
    • 1,397
    • 4
    • 17
    • 31
    0
    votes
    1 answer

    PHPcrawler - tmp file

    I downloaded the latest version of phpcrawler, and I can access a test website of my own. I only have an image and some text on this site, I run the crawler and I receive the text minus the image because I did the proper…
    0
    votes
    1 answer

    Instantiating a new PHPCrawl Class throws the error "Call to undefined method stdClass::receivePage()"

    I use a foreach loop to loop through multiple seed urls. During each loop, I instantiate a crawler using PHPCrawl and the next seed url. foreach($companyUrls as $companyId => $companyUrl) { $crawler = new MyCrawler($companyUrl, $companyId); …
    T. Brian Jones
    • 13,002
    • 25
    • 78
    • 117
    0
    votes
    2 answers

    single page web crawl in PHP

    I am new to PHP. Can someone help me figure out how to crawl single html page and print all the words in the source code of that page?
    rkt
    • 1,171
    • 2
    • 9
    • 18
    0
    votes
    1 answer

    PHPCrawl sometimes returns empty handed

    I'm using the PHPCrawl class to spider websites and build a list of links. It all works well, if slowly, and I then use the links to perform other tasks. I'm encountering a problem where the first time I run the script it completes with no result,…
    Leo
    • 6,553
    • 2
    • 29
    • 48
    0
    votes
    1 answer

    I want to get specific urls from this document using a php crawler

    I have no idea of what to do about this and I'm probably gonna get some down votes. I have an web page similar to this:
  • Unknown Link
  • I want to crawl a page filled with…
    user5294439
    0
    votes
    1 answer
    0
    votes
    1 answer

    How do i get all the weblinks from a website?

    I want to get all the links(web posts) available in website . And also if any new post is added to website I should be able get the link. I will be having list of 10 websites and the link extraction process needs to be run periodically. Can some one…
    0
    votes
    1 answer

    PHPCrawl - Attempted to call method "getURIContent" on class "PHPCrawlerUtils"

    I'm trying to use PHPCrawl with Symfony2. I've first installed the PHPCrawl library using composer, then I created a folder "DependencyInjection" in my bundle, where I put the class "MyCrawler" which extends PHPCrawler. I configured it as a…
    0
    votes
    1 answer

    how to crawl a single page and not any links contained in it and output the source?

    I am using phpcrawl and below is the code. I want to crawl the mentioned link and get all the jobs. Now, I am crawling it by passing the link but it crawling all the links what we see in page-source view. But I want to see the source of only the…
    Manojkumar
    • 1,351
    • 5
    • 35
    • 63
    0
    votes
    2 answers

    Can PHPCrawl can be used for scraping websites and how different is from Scrapy?

    I want to scrape few websites and many suggested Scrapy. It is Python based and since I am very familiar with PHP I looked for alternatives. I got a crawler PHPCrawl. I am not sure if it is just a crawler or will it provides scraping facility as…
    Manojkumar
    • 1,351
    • 5
    • 35
    • 63
    0
    votes
    1 answer

    PHPCrawl: Output sitemap to XML file on server

    I am trying to use PHPCrawl for my website's sitemap. However, I am having trouble trying to get it to output to an xml sitemap, on the server. Any help?
    helPHP
    • 1
    • 2
    0
    votes
    1 answer

    Set cookie for a specific domain in PHPCrawl

    I use PHPCrawl for crawl websites but now I want to add a cookie to specific domain, because this domain has a authentication and I want to get information in authorized pages. How can I add a cookie to specific domain?
    Afshin Mehrabani
    • 33,262
    • 29
    • 136
    • 201
    1
    2