I am reading PySpark SQL Dataframe from Elasticsearch index, with the read option of es.read.metadata=True
. I want to filter the data by condition on metadata field, but get an empty result, although there should be result. Is it possible to get the actual result?
I did get result when I used limit
on the dataframe, even with a very big number, even larger then the dataframe size.
In addition, I did get result when using other not _metadata
related field.
for example:
df.where(df._metadata._score > 1.0).select(df._metadata._id).show()
the result is empty:
+--------------+
|_metadata[_id]|
+--------------+
+--------------+
But when using limit
:
df.limit(1000000).where(df._metadata._score > 1.0).select(df._metadata._id).show()
the result is not empty:
+--------------------+
| _metadata[_id]|
+--------------------+
|cICqm2gBHl8Vy6RZyu_L|
+--------------------+