0

I would like to scrape the table from https://wyniki.tge.pl/wyniki/rdn/indeksy/. All rows (IRDN, etc.) and the header. I was trying to rebuild this for my case Using JSoup To Extract HTML Table Contents but I encountered some difficulties with remodeling it.

I would like to scrape these tables (with energy market prices) and put it on my website, after adding some CSS.

I tried with this code:

import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class WebScraper {

    public static void main(String[] args) throws IOException {

        Document doc = Jsoup.connect("https://wyniki.tge.pl/wyniki/rdn/indeksy/").get();
        for (Element table : doc.select("table.t-02")) {
            for (Element row : table.select("tr")) {
                Elements tds = row.select("td");
                if (tds.size() > 6) {
                    System.out.println(tds.get(0).text() + ":" + tds.get(1).text() + ":" + tds.get(2).text() + ":" + tds.get(3).text() + ":" + tds.get(4).text() + ":" + tds.get(5).text());
                }
            }
        }

    }
}

results:

Cz. 26/01:Pt. 27/01:So. 28/01:N. 29/01:Pn. 30/01:Wt. 31/01
PLN/MWh:182.48:176.20:147.22:137.89:169.02
MWh:67 698.70:66 088.70:72 720.40:75 460.00:58 887.10
PLN/MWh:207.61:196.18:152.71:146.69:194.75
MWh:39 442.90:40 883.10:49 538.90:48 864.30:34 401.20
PLN/MWh:184.82:176.74:145.98:135.67:171.83
MWh:67 698.70:66 088.70:72 720.40:75 460.00:58 887.10
PLN/MWh:207.27:196.66:152.22:144.41:195.66
MWh:39 442.90:40 883.10:49 538.90:48 864.30:34 401.20

But because of table properties (header 7 columns and body 8 columns), I would like to omit column with PLN/MWh or MWh (so first body column) and download the furthest column (8th) to put values on right places like:

Cz. 26/01:Pt. 27/01:So. 28/01:N. 29/01:Pn. 30/01:Wt. 31/01
182.48:176.20:147.22:137.89:169.02:178.91
67 698.70:66 088.70:72 720.40:75 460.00:58 887.10:64 432.20
207.61:196.18:152.71:146.69:194.75:201.19
39 442.90:40 883.10:49 538.90:48 864.30:34 401.20:..
184.82:176.74:145.98:135.67:171.83:..
67 698.70:66 088.70:72 720.40:75 460.00:58 887.10:...
207.27:196.66:152.22:144.41:195.66:..
39 442.90:40 883.10:49 538.90:48 864.30:34 401.20:..

Thanks for help!

Community
  • 1
  • 1
piotr
  • 152
  • 1
  • 2
  • 13
  • Have you jsoup-x.x.x.jar in the classpath when compiling ? – Aubin Jan 30 '17 at 20:30
  • Yes, I have added jsoup to my workfile and builded a path – piotr Jan 30 '17 at 20:31
  • 1
    Piotr, you have not imported the 'Element', 'Elements' names. Consider using IDEs which would do that for you, for example Intellij Idea community edition or maybe Eclipse. More than that the `doc` variable is not defined (did you mean `d`?) – mszymborski Jan 30 '17 at 20:38
  • I have just started using eclipse and this is my first excercise with Java. I imported the 'Element', 'Elements' names. Yes `d` variable instead of `doc` sorry for mistake, I have already fixed that but still no any improvement. – piotr Jan 30 '17 at 20:46
  • Compiler needs to know where each type you are using comes from. So instead of writing `full.package.name.of.SomeClass` everywhere we are allowed to use `import full.package.name.of.SomeClass;` and use only `SomeClass` in our code. But without that import, compiler will try to search for that class in same package as class you are creating. But if it is not there it can't know where to search for it, so it can't see it as valid type. – Pshemo Jan 30 '17 at 20:52
  • @Pshemo ok I see, so Elements, should be added at the beginnig like `import org.jsoup.select.Elements;` and so each type which I use right? – piotr Jan 30 '17 at 21:03
  • Yes. Compiler needs to know where it is supposed to search for `SomeClass.class` file. You only don't need to add imports for classes from `java.lang` package (like `String`), or for classes which come from same package as class you are creating. – Pshemo Jan 30 '17 at 21:09

0 Answers0