While the use of Import
is probably a better and more robust way, I found that, at least for this particular problem, my own HTML parser (published in this thread), works fine with a small amount of post-processing. If you take the code from there and execute it, augmenting it with this function:
Clear[findAndParseTables];
findAndParseTables[text_String] :=
Module[{parsed = postProcess@parseText[text]},
DeleteCases[
Cases[parsed, _tableContainer, Infinity],
_attribContainer | _spanContainer, Infinity
] //.
{(supContainer | tdContainer | trContainer | thContainer)[x___] :> {x},
iContainer[x___] :> x,
aContainer[x_] :> x,
"\n" :> Sequence[],
divContainer[] | ulContainer[] | liContainer[] | aContainer[] :> Sequence[]}];
Then you get, I think, a pretty much complete data by this code:
text = Import["http://en.wikipedia.org/wiki/Unemployment_by_country", "Text"];
myData = First@findAndParseTables[text];
Here is how the result looks:
In[92]:= Short[myData,5]
Out[92]//Short=
tableContainer[{{Country / Region},{Unemployment rate (%)},{Source / date of information}},
{{Afghanistan},{35.0},{2008,{3}}},{{Albania},{13.49},{2010 (Q4),{4}}},
{{Algeria},{10.0},{2010 (September),{5}}},<<188>>,{{West Bank},{17.2},{2010,{43}}},
{{Yemen},{35.0},{2009 (June),{128}}},{{Zambia},{16.0},{2005,{129}}},{{Zimbabwe},{97.0},{2009}}]
What I like about this approach (as opposed to say, Import->XMLObject
) is that, since I convert the web page into Mathematica expression with minimal syntax (unlike e.g. XML objects), it is often very easy to establish a set of replacement rules which does the right post-processing in each given case. A final disclaimer is that my parser is not robust and does for sure contain a number of bugs, so be warned.