2

I need to import excel file from the link. I tried to do it with

filedlurl = 'https://www.nordpoolgroup.com/48d3ac/globalassets/marketdata-excel-files/exchange-ee-connections_2021_daily.xls'
    
df = pd.read_excel(filedlurl, skiprows=2)

But the error was XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'Exchange'

then I found the following way from using Pandas to read in excel file from URL - XLRDError

df = pd.read_csv('https://www.nordpoolgroup.com/48d3ac/globalassets/marketdata-excel-files/exchange-ee-connections_2021_daily.xls',
                 #sep='\t',
                 #parse_dates=[0],
                 names=['a','b','c','d','e','f'],
                skiprows=2)
df

Here is also a strange output. How can I get the following table as when manually download from the web?

Erko Tru
  • 143
  • 7

3 Answers3

3

Your file isn't a CSV or an Excel file. Actual contents are an HTML table (see as follows).

Exchange in {0}, Import(+)/Export(-)
<html>
    <body>
        <table>
            <thead>
                <tr>
                    <td colspan="5">Exchange EE connections in MWh, MW</td>
                </tr><tr>
                    <td colspan="5">Data was last updated 06-01-2021</td>
                </tr><tr>
                    <td></td><td style="text-align:center;">EE net exchange</td><td style="text-align:center;">EE - FI</td><td style="text-align:center;">EE - LV</td><td style="text-align:center;">EE - RU</td>
                </tr>
            </thead><tbody>
                <tr>
                    <td style="text-align:left;">01-01-2021</td><td style="text-align:right;">14575</td><td style="text-align:right;">20969,0</td><td style="text-align:right;">-4884,0</td><td style="text-align:right;">-1510,0</td>
                </tr><tr>
                    <td style="text-align:left;">02-01-2021</td><td style="text-align:right;">12073</td><td style="text-align:right;">22479,0</td><td style="text-align:right;">-8001,0</td><td style="text-align:right;">-2405,0</td>
                </tr><tr>
                    <td style="text-align:left;">03-01-2021</td><td style="text-align:right;">14321</td><td style="text-align:right;">22540,0</td><td style="text-align:right;">-8259,0</td><td style="text-align:right;">40,0</td>
                </tr><tr>
                    <td style="text-align:left;">04-01-2021</td><td style="text-align:right;">14662</td><td style="text-align:right;">17653,0</td><td style="text-align:right;">-5829,0</td><td style="text-align:right;">2838,0</td>
                </tr><tr>
                    <td style="text-align:left;">05-01-2021</td><td style="text-align:right;">13570</td><td style="text-align:right;">13779,0</td><td style="text-align:right;">-5314,0</td><td style="text-align:right;">5105,0</td>
                </tr><tr>
                    <td style="text-align:left;">06-01-2021</td><td style="text-align:right;">6243</td><td style="text-align:right;"></td><td style="text-align:right;"></td><td style="text-align:right;"></td>
                </tr>
            </tbody>
        </table>
    </body>
</html>

Use pd.read_html like so:

import pandas as pd

url = 'https://www.nordpoolgroup.com/48d3ac/globalassets/marketdata-excel-files/exchange-ee-connections_2021_daily.xls'
dfs = pd.read_html(url)
df = dfs[0]

That you can open your file in Excel is because Excel iterates through possible formats until finding something that works. Eg you can make a tab separated values (which should have extension .tsv) file, append .xls and while it isn't an actual horrible spread sheet format (XLS), Excel will still open it normally. It also does this with HTML data.

ifly6
  • 5,003
  • 2
  • 24
  • 47
1

First, can download the file using Python this way, using urllib.request:

import urllib.request
url='https://www.nordpoolgroup.com/48d3ac/globalassets/marketdata-excel-files/exchange-ee-connections_2021_daily.xls'
filename='yourfile.csv'
urllib.request.urlretrieve (url,filename)

Then read it in using Pandas:

import pandas as pd
df = pd.read_excel(filename)

Which will give you:

XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'Exchange'

After checking with a real Excel, I find:

enter image description here

So something is wrong with your file, I suspect. With a correct file, the above method should work.

zabop
  • 6,750
  • 3
  • 39
  • 84
  • Yes, it always starts with this view, but if you press yes, the excel will open – Erko Tru Jan 06 '21 at 15:34
  • That means it probably isn't an actual Excel file; Excel iterates through possible formats until seeing whether it works. Eg you can make a tab separated values (which should have extension `.tsv`) file, append `.xls` and while it isn't an actual horrible spread sheet format (XLS), Excel will still open it normally. Try opening the file in a text editor to see actual internal representation. – ifly6 Jan 06 '21 at 15:53
1

Latest pandas (pandas==1.2.0) is able to read from url using pd.read_excel when requests is installed.

import pandas
url = "https://www.nordpoolgroup.com/48d3ac/globalassets/marketdata-excel-files/exchange-ee-connections_2021_daily.xls"
df = pandas.read_excel(url)

But your file seems corrupted, it triggers:

ValueError: File is not a recognized excel file
zerofuxor
  • 336
  • 2
  • 8