I have got a file in HDFS (/user/username/Project/data/file.xlsx) that I want to read into a DataFrame. (I do not care if it is a PySpark DataFrame or Pandas, but Pandas is preferred.)
I am using a Zeppelin Notebook to do my code.
Is it possible to get data from this file?
I have already tried the following commands, but none of them worked:
df = pd.read_excel("/user/username/Project/data/file.xlsx")
df = pd.read_excel("hdfs:///user/username/Project/data/file.xlsx")
df = pd.read_excel("hdfs://user/username/Project/data/file.xlsx")