You can also download a simple command line tool to deal with the PDF file you linked to. The run this command to extract the table(s) on the first page:
pdftotext \
-enc UTF-8 \
-l 1 \
-table \
EICH-FinalEntriesforwebsite_Neutral.pdf \
EICH-FinalEntriesforwebsite_Neutral.txt
-enc UTF-8
: sets the text encoding so that the Ö, Ä, Ü and İ (as well as ö, ä, ü, ß, á, š, ē, í and č) characters in the text get correctly extracted.
-l 1
: tells the command to extract as the last page the page number 1.
-table
: this is the decisive parameter.
The command produces this output:
EUROPEAN ATHLETICS INDOOR CHAMPIONSHIPS
PRAGUE / CZE, 6-8 MARCH 2015
FINAL ENTRIES - MEN
Lastname Firstname Country DOB PB SB
1500m Men
Rowe Brenton AUT 17/08/1987
Vojta Andreas AUT 09/06/1989 3:38.99 3:41.09
Khadiri Amine CYP 20/11/1988 3:45.16 3:45.16
Friš Jan CZE 19/12/1995 3:43.76 3:43.76
Holuša Jakub CZE 20/02/1988 3:38.79 3:41.54
Kocourek Milan CZE 06/12/1987 3:43.97 3:43.97
Bueno Andreas DEN 07/07/1988 3:42.78 3:42.78
Alcalá Marc ESP 07/11/1994 3:41.79 3:41.79
Mechaal Adel ESP 05/12/1990 3:38.30 3:38.30
Olmedo Manuel ESP 17/05/1983 3:39.82 3:40.66
Ruíz Diego ESP 05/02/1982 3:36.42 3:40.60
Kowal Yoann FRA 28/05/1987 3:38.07 3:39.22
Grice Charlie GBR 07/11/1993 3:39.44 3:39.44
O'Hare Chris GBR 23/11/1990 3:37.25 3:40.42
Orth Florian GER 24/07/1989 3:39.97 3:40.20
Tesfaye Homiyu GER 23/06/1993 3:34.13 3:34.13
Kazi Tamás HUN 16/05/1985 3:44.28 3:44.28
Mooney Danny IRL 20/06/1988 3:42.69 3:42.69
Travers John IRL 16/03/1991 3:42.52 3:43.74
Bussotti Neves Junior Joao Capistrano M. ITA 10/05/1993 3:47.58 3:47.58
Jurkēvičs Dmitrijs LAT 07/01/1987 3:45.95 3:45.95
Ingebrigtsen Henrik NOR 24/02/1991 3:44.00
Ingebrigtsen Filip NOR 20/04/1993
Krawczyk Szymon POL 29/12/1988 3:41.64 3:41.64
Ostrowski Artur POL 10/07/1988 3:41.36 3:41.36
Żebrowski Krzysztof POL 09/07/1990 3:41.49 3:41.49
Smirnov Valentin RUS 13/02/1986 3:37.55 3:38.74
Nava Goran SRB 15/04/1981 3:40.65 3:44.49
Pelikán Jozef SVK 29/07/1984 3:43.85 3:45.51
Ek Staffan SWE 13/11/1991 3:43.54 3:43.54
Rogestedt Johan SWE 27/01/1993 3:40.03 3:40.03
Özbilen İlham Tanui TUR 05/03/1990 3:34.76 3:38.05
Özdemir Ramazan TUR 06/07/1991 3:44.35 3:44.35
3000m Men
Rowe Brenton AUT 17/08/1987
Vojta Andreas AUT 09/06/1989 7:59.95 7:59.95
Note, however:
The -table
parameter to the pdftotext command line tool is only available in the XPDF-version 3.04, which you can download here: www.foolabs.com/xpdf/download.html. It is NOT (yet) available in Poppler's fork of pdftotext (latest version of which is 0.43.0).
If you only have Poppler's pdftotext, you'd have to use the -layout
parameter (instead of -table
), which gives you a similarly good result for the PDF file in question:
pdftotext \
-enc UTF-8 \
-l 1 \
-layout \
EICH-FinalEntriesforwebsite_Neutral.pdf \
EICH-FinalEntriesforwebsite_Neutral.txt
However, I have seen PDFs where the result is much better with -table
(and XPDF) than it is with -layout
(and Poppler).
(XPDF has the -layout
parameter too -- so you can see the difference if you try both.)