I would like to scrape the table from https://wyniki.tge.pl/wyniki/rdn/indeksy/. All rows (IRDN, etc.) and the header. I was trying to rebuild this for my case Using JSoup To Extract HTML Table Contents but I encountered some difficulties with remodeling it.
I would like to scrape these tables (with energy market prices) and put it on my website, after adding some CSS.
I tried with this code:
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class WebScraper {
public static void main(String[] args) throws IOException {
Document doc = Jsoup.connect("https://wyniki.tge.pl/wyniki/rdn/indeksy/").get();
for (Element table : doc.select("table.t-02")) {
for (Element row : table.select("tr")) {
Elements tds = row.select("td");
if (tds.size() > 6) {
System.out.println(tds.get(0).text() + ":" + tds.get(1).text() + ":" + tds.get(2).text() + ":" + tds.get(3).text() + ":" + tds.get(4).text() + ":" + tds.get(5).text());
}
}
}
}
}
results:
Cz. 26/01:Pt. 27/01:So. 28/01:N. 29/01:Pn. 30/01:Wt. 31/01
PLN/MWh:182.48:176.20:147.22:137.89:169.02
MWh:67 698.70:66 088.70:72 720.40:75 460.00:58 887.10
PLN/MWh:207.61:196.18:152.71:146.69:194.75
MWh:39 442.90:40 883.10:49 538.90:48 864.30:34 401.20
PLN/MWh:184.82:176.74:145.98:135.67:171.83
MWh:67 698.70:66 088.70:72 720.40:75 460.00:58 887.10
PLN/MWh:207.27:196.66:152.22:144.41:195.66
MWh:39 442.90:40 883.10:49 538.90:48 864.30:34 401.20
But because of table properties (header 7 columns and body 8 columns), I would like to omit column with PLN/MWh
or MWh
(so first body column) and download the furthest column (8th) to put values on right places like:
Cz. 26/01:Pt. 27/01:So. 28/01:N. 29/01:Pn. 30/01:Wt. 31/01
182.48:176.20:147.22:137.89:169.02:178.91
67 698.70:66 088.70:72 720.40:75 460.00:58 887.10:64 432.20
207.61:196.18:152.71:146.69:194.75:201.19
39 442.90:40 883.10:49 538.90:48 864.30:34 401.20:..
184.82:176.74:145.98:135.67:171.83:..
67 698.70:66 088.70:72 720.40:75 460.00:58 887.10:...
207.27:196.66:152.22:144.41:195.66:..
39 442.90:40 883.10:49 538.90:48 864.30:34 401.20:..
Thanks for help!