-1

I am a complete beginner with PHP. I understand the concepts but am struggling to find a tutorial I understand. My goal is this:

  1. Use the xpath addons for Firefox to select which piece of text I would like to scrape from a site
  2. Format the scraped text properly
  3. Display the text on a website

Example)

// Get the HTML Source Code
$url='http://steamcommunity.com/profiles/76561197967713768';
$source = file_get_contents($url);

// DOM document Creation
$doc = new DOMDocument;
$doc->loadHTML($source);

// DOM XPath Creation
$xpath = new DOMXPath($doc);

// Get all events
$username = $xpath->query('//html/body/div[3]/div[1]/div/div/div/div[3]/div[1]');
echo $username; 
?>

In this example, I would like to scrape the username (which at the time of writing is mopar410).

Thank you for your help - I am so lost :( Right now I managed to use xpath with importXML in Google doc spreadsheets and that works, but I would like to be able to do this on my own site with PHP to learn how.

This is code I found online and edited the URL and the variable - as I am not aware of how to write this myself.

Artjom B.
  • 61,146
  • 24
  • 125
  • 222
Chatyak
  • 183
  • 1
  • 1
  • 11
  • There doesn't appear to be a question here – Phil Jul 24 '14 at 02:04
  • Hi Phil, sorry - didn't seem to get a notification on comment updates. My question is that I am having a terrible time finding a proper tutorial of the simplest nature... just the basics of scraping text with php/xpath and displaying that text on a website. I have found a lot of tutorials but many of them are only partial. The code above is what I put together from nuermous sources but it doesn't appear to work. – Chatyak Jul 26 '14 at 22:10

1 Answers1

3

They have a public API.

Simply use http://steamcommunity.com/profiles/STEAM_ID/?xml=1

<?php

$profile = simplexml_load_file('http://steamcommunity.com/profiles/76561197967713768/?xml=1', 'SimpleXMLElement', LIBXML_NOCDATA);

echo (string)$profile->steamID;

Outputs: mopar410 (at time of writing)

This also provides other information such as mostPlayedGame, hoursPlayed, etc (look for the xml node names).

Dave Chen
  • 10,887
  • 8
  • 39
  • 67
  • Hi Dave, - thank you for your time. This code is indeed working and shows the user's name. I am able to follow the logic of what you wrote, albeit if it seems simple to you and others. Is it ok if I ask for help with another webpage example? Essentially what I would like to do is have 1 page on a website that takes a bunch of data (user names, stats etc..) and then save those to a database.... and then use php to call the info from the databse in an HTML table. – Chatyak Jul 26 '14 at 22:16
  • @user1502577 Saving the data is pretty easy, you need need to know which nodes from the xml you want to save, then using mysqli or pdo, do a simple insert. Then you have another page that just selects the information from the database. `How do I insert and read data from a database?` is too generic of a question I'm afraid, [here's](https://developers.google.com/maps/articles/phpsqlinfo_v3#AddingRow) a tutorial by Google if you want to see some examples on that. – Dave Chen Jul 26 '14 at 22:20
  • Will do my best. I have a couple more specific questions on this topic if that's okay. Using your specific example, I tried another example hoursPlayed; ?> which doesn't appear to display the "hoursPlayed" text for some reason. I am just trying the same process on other random information to get the hang of it. Thank you – Chatyak Jul 26 '14 at 22:37
  • @Chatyak Use `print_r($profile)`. There's `$profile->hoursPlayed2Wk`, and for the top most played game, you can use `$profile->mostPlayedGmes->mostPlayedGame[0]->hoursPlayed`. – Dave Chen Jul 26 '14 at 22:48
  • @Chatyak Oops, I accidentally typed `gmes`. Use `print_r($profile->mostPlayedGames->mostPlayedGame[0]->hoursPlayed);`. Anyways, it's pretty easy just from looking at `print_r($profile)`. Look at the object it produced and you can navigate through it with ease. – Dave Chen Jul 26 '14 at 23:29
  • Using this syntax: $MoparStats = simplexml_load_file('http://steamcommunity.com/profiles/76561197967713768/stats/L4D2/?xml=1', 'SimpleXMLElement', LIBXML_NOCDATA); print_r($MoparStats->stats->hoursPlayed); gave me the following output "SimpleXMLElement Object ( [0] => 20h 18m )". Would you know how to only retrieve the "20h 18m" portion? – Chatyak Jul 26 '14 at 23:32
  • Nevermind... figured it out by using echo string instead of print_r: "echo (string)$MoparStats->stats->hoursPlayed;" I don't understand the reasoning behind why it works but it does. – Chatyak Jul 26 '14 at 23:33
  • @Chatyak Cast to a string. And you're right. The reason is because you're dealing with an SimpleXMLElement, not a string. – Dave Chen Jul 26 '14 at 23:33
  • Thank you for the help Dave. I am still a bit lost (you understand this much more than I do). I will create a separate question and would appreciate your input there. http://stackoverflow.com/questions/24976537/beginner-php-xpath-text-display – Chatyak Jul 26 '14 at 23:47