Your browser (Internet Explorer 6) is out of date. It has known security flaws and may not display all features of this and other websites. Learn how to update your browser.
X

phpQuery

This is the best thing ever. I just used it to scrape a website and it is fantastic, and saved me tons of time. It allows you to use jQuery style selectors in PHP. This is an example where I’m scraping, but you could also change the page and add things before sending it out to the browser.

You install a PEAR package (you can also just copy the code, but this is the easiest way to keep it up to date):

        pear channel-discover phpquery-pear.appspot.com  
        pear install phpquery/phpQuery 
      

Then you include it in your code:

        require('phpQuery.php');
      

Here’s an example of scraping form data from a CMS:



<?php
  // $pages is an array of page IDs that I scraped manually
  foreach($pages as $page_ID) {
    $source_html = file_get_contents('http://example.com/cms/?pg_id=' . $page_ID);
    $doc         = phpQuery::newDocumentHTML($source_html);
    $data        = (object)array(); // Create an empty object for our clean data
    $tags        = array();         // An array to hold checkbox values

    // Get all of the checked checkboxes on the page
    foreach(pq('input:checked') as $input) {
      $name   = $input->getAttribute('name');
      $value  = $input->getAttribute('value');
      $tags[] = $value;
    }


    // Get the selected text of all dropdowns
    foreach(pq('select') as $input) {
      $name        = $input->getAttribute('name');
      $data->$name = $doc->find('select[name=' . $name . '] option:selected')->text();
    }


    // Get the names and values of all input fields. Ignore checkboxes.
    foreach(pq('input') as $input) {
      $name  = $input->getAttribute('name');
      $value = $input->getAttribute('value');
      $type  = $input->getAttribute('type');

      if($type != 'checkbox') $data->$name = $value;
    }

    print_r($data); // This is your fancy new object with all of the form data you just grabbed
    print_r($tags); // All of the checkbox values
    // TODO: Insert into the database here
    unset($data, $tags);   // Now flush it and do it all over again
  }