I tried to write a sample code to read data from a csv file in Azure Data Lake to a dataframe in pandas.
Here is my sample code as below.
from azure.datalake.store import core, lib, multithread
import pandas as pd
tenant_id = '<your Azure AD tenant id>'
username = '<your username in AAD>'
password = '<your password>'
store_name = '<your ADL name>'
token = lib.auth(tenant_id, username, password)
# Or you can register an app to get client_id and client_secret to get token
# If you want to apply this code in your application, I recommended to do the authentication by client
# client_id = '<client id of your app registered in Azure AD, like xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx'
# client_secret = '<your client secret>'
# token = lib.auth(tenant_id, client_id=client_id, client_secret=client_secret)
adl = core.AzureDLFileSystem(token, store_name=store_name)
f = adl.open('<your csv file path, such as data/test.csv in my ADL>')
df = pd.read_csv(f)
Note: If you were using client_id
& client_secret
for authentication, you must add the necessary access permission for the app which has Reader
role at least in Azure AD, as the figures below. For more information about accessing security, please see the offical document Security in Azure Data Lake Storage Gen1
. Meanwhile, about how to register an app in Azure AD, you can refer to my answer for the other SO thread How to get an AzureRateCard with Java?.


Any concern, please feel free to let me know.