1

I am reading PySpark SQL Dataframe from Elasticsearch index, with the read option of es.read.metadata=True. I want to filter the data by condition on metadata field, but get an empty result, although there should be result. Is it possible to get the actual result?

I did get result when I used limit on the dataframe, even with a very big number, even larger then the dataframe size.

In addition, I did get result when using other not _metadata related field.

for example:

df.where(df._metadata._score > 1.0).select(df._metadata._id).show()

the result is empty:

+--------------+
|_metadata[_id]|
+--------------+
+--------------+

But when using limit:

df.limit(1000000).where(df._metadata._score > 1.0).select(df._metadata._id).show()

the result is not empty:

+--------------------+
|      _metadata[_id]|
+--------------------+
|cICqm2gBHl8Vy6RZyu_L|
+--------------------+
David206
  • 85
  • 1
  • 5

0 Answers0