3

The problem here I am trying to solve is I have a bunch of documents which context mathematical expressions/formulas. I want to search the documents by the formula or expression.

So far based on my research I'm considering to convert the mathematical expression to latex format and store as a string in the database (elastic search).

With this approach will be I able to search for documents with the latex string?

Example latex conversion of a2 + b2 = c2 is a^{2} + b^{2} = c^{2} . Can this string be searchable in elastic search ?

shoaib1992
  • 410
  • 1
  • 8
  • 26
  • 2
    Can you give exemple of these expressions ? And how do you want to search them – Luc E Mar 31 '20 at 23:23
  • For example, my document contains the string: My favourite formula is a2 + b2 = c2. I need to have a mechanism where I need to search which all documents the formula a2 + b2 = c2. Note: While I insert can convert the formula to latex format. – shoaib1992 Mar 31 '20 at 23:48
  • Yes you will be able to do it. But you will have to configure the analyzers for this string. Have a look to the documentation : https://www.elastic.co/guide/en/elasticsearch/reference/7.x/analysis.html – Emmanuel Demey Apr 01 '20 at 07:43
  • Hi Gillespie59, my string would a latex conversion of a2 + b2 = c2 which is a^{2} + b^{2} = c^{2} . Can this string also be searchable ? – shoaib1992 Apr 01 '20 at 08:01
  • If you index the field with a Keyword Type, your string will not be analysed and you will be able to search the exact string "a^{2} + b^{2} = c^{2}". If you want to search both exact format, you have to create one sub-field (or an other field) which will be the other format, the two fields should be Keywords. Note that solution is for search exact string and not part of expressions, also it is case sensitive. – Luc E Apr 01 '20 at 09:24
  • @shoaib1992, did you get a chance to look at my answer ? – Amit Apr 01 '20 at 11:22
  • Hi Opster, I did look at your solution. The approach looks good. The other I want to check is that I wanted to use store the formulas in documents in latex format. The reason being that for example, a complex differential equation may not be possible to store int the database as it as a string. Check about latex here https://www.authorea.com/users/77723/articles/110898-how-to-write-mathematical-equations-expressions-and-symbols-with-latex-a-cheatsheet Can I apply the solution provided by you on latex strings ? – shoaib1992 Apr 01 '20 at 11:56
  • @OpsterElasticsearchNinja can you please look at this question https://stackoverflow.com/questions/60976094/is-is-possible-to-index-documents-latex-strings-in-elastic-search – shoaib1992 Apr 01 '20 at 16:48
  • @OpsterElasticsearchNinja is there a way to connect with you? – shoaib1992 Apr 01 '20 at 16:49
  • Sure can u give me ur mail id – Amit Apr 01 '20 at 16:54

1 Answers1

1

I agree with user @Lue E with some more modifications and tried with a simple keyword approach but gave me some issues, hence I modified my approach to using the keyword tokenizer in my own custom analyzer which should solve most of your use-cases.

Index def with a custom analyzer

{
    "settings": {
        "analysis": {
            "analyzer": {
                "my_custom_analyzer": {
                    "type": "custom",
                    "tokenizer": "keyword", --> to make it searchable
                    "filter": [
                        "lowercase", --> case insensitive search
                        "trim" --> remove extra spaces
                    ]
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "mathformula": {
                "type": "text",
                "analyzer": "my_custom_analyzer"
            }
        }
    }
}

Index sample docs

 {
        "mathformula" : "(a+b)^2 = a^2 + b^2 + 2ab"
    }

{
    "mathformula" : "a2+b2 = c2"
}

Search query(match query, uses the same analyzer of the index time)

{
    "query": {
        "match" : {
            "mathformula" : {
                "query" : "a2+b2 = c2"
            }
        }
    }
}

The search result contains only first indexed doc

 "hits": [
            {
                "_index": "so_math",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.6931471,
                "_source": {
                    "mathformula": "a2+b2 = c2"
                }
            }
        ]
Amit
  • 30,756
  • 6
  • 57
  • 88