0

Is it possible to get a regex expression back just providing the column name so that HANA database has a look for the most frequent regex itself and returns this?

For example the column values are 1, 2, 3, 4, 55, nAn I would expect [1-9] as a result. Thanks

Sandra Rossi
  • 11,934
  • 5
  • 22
  • 48
M__
  • 123
  • 5

1 Answers1

1

So your input is 1,2,3,4,5 and you are expecting [1-9]. Wouldn't [1-5] be a better fit? Or does [1|2|3|4|5] cover your data space best? What happens if you add a 8? Would it rather be [1-8] or [1|2|3|4|5|8]. The latter is more optimal in terms of coverage of your observed data.

If you are looking for a regular expression that minimally covers your data records a expression of type [record_1|...|record_n] is always a good choice. However, this is probably not what you want.

Finding a suitable (=minimal in terms of scope and readability) regex based on input data is very difficult or even impossible since it depends on a subjective quantification of data coverage vs. readability. As a consequence there is no such functionality in SAP HANA (and probably also not in other dbms).

What you can do is to define a set of regexes, that you would expect (e.g. one for numbers, one for text, one for dates, etc.) and programmatically determine the one that best fits your input data.

Mathias Kemeter
  • 933
  • 2
  • 11