1

I am converting some existing Oracle queries to MSSQL Server (2008) and can't figure out how to replicate the following Regex check:

SELECT SomeField
FROM SomeTable
WHERE NOT REGEXP_LIKE(TO_CHAR(SomeField), '^[0-9]{2}[.][0-9]{7}$');

That finds all results where the format of the number starts with 2 positive digits, followed by a decimal point, and 7 decimal places of data: 12.3456789

I've tried using STR, CAST, CONVERT, but they all seem to truncate the decimal to 4 decimal places for some reason. The truncating has prevented me from getting reliable results using LEN and CHARINDEX. Manually adding size parameters to STR gets slightly closer, but I still don't know how to compare the original numerical representation to the converted value.

SELECT SomeField
     , STR(SomeField, 10, 7)
     , CAST(SomeField AS VARCHAR)
     , LEN(SomeField )
     , CHARINDEX(STR(SomeField ), '.') 
FROM SomeTable
+------------------+------------+---------+-----+-----------+
|       Orig       |    STR     |  Cast   | LEN | CHARINDEX |
+------------------+------------+---------+-----+-----------+
| 31.44650944      | 31.4465094 | 31.4465 |   7 |         0 |
| 35.85609         | 35.8560900 | 35.8561 |   7 |         0 |
| 54.589623        | 54.5896230 | 54.5896 |   7 |         0 |
| 31.92653899      | 31.9265390 | 31.9265 |   7 |         0 |
| 31.4523333333333 | 31.4523333 | 31.4523 |   7 |         0 |
| 31.40208955      | 31.4020895 | 31.4021 |   7 |         0 |
| 51.3047869443893 | 51.3047869 | 51.3048 |   7 |         0 |
| 51               | 51.0000000 | 51      |   2 |         0 |
| 32.220633        | 32.2206330 | 32.2206 |   7 |         0 |
| 35.769247        | 35.7692470 | 35.7692 |   7 |         0 |
| 35.071022        | 35.0710220 | 35.071  |   6 |         0 |
+------------------+------------+---------+-----+-----------+
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
Parrish Husband
  • 3,148
  • 18
  • 40
  • What is the datatype of `SomeField`? – Martin Smith Feb 28 '20 at 17:40
  • @MartinSmith it's a `float` – Parrish Husband Feb 28 '20 at 17:40
  • 1
    Doing string comparisons on a number makes no sense. Just use `where somefield >= 0 and somefield < 100`. – Gordon Linoff Feb 28 '20 at 17:41
  • @GordonLinoff that would not work, since the requirement is two digits before a decimal, followed by seven digits of precision. Any other numerical format cannot be returned. – Parrish Husband Feb 28 '20 at 17:43
  • 1
    @ParrishHusband . . . I added the Oracle tag because you are using an arcane feature of `number`s that apply to Oracle but not to SQL Server. – Gordon Linoff Feb 28 '20 at 17:45
  • @GordonLinoff thanks for updating. The `REGEXP_LIKE` is performed on a string after the `TO_CHAR` cast, so I assumed it was targeted at strings. The closest option in modern SQL that I found was `FORMAT`, where I could simply do something like `XX.XXXXXXX`. – Parrish Husband Feb 28 '20 at 17:47
  • The fact that you are using `float` adds a problematic complexity here. You can't perform `LIKE` expressions against a `float`, so you need to convert it to a `varchar`, and none of the `CONVERT` style codes will give you the format you want. [DB<>Fiddle](https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=b9cc1b7dfe7281095f65c48dd89f01fe) – Thom A Feb 28 '20 at 17:47
  • For added context, this is checking if field operators are entering Latitude and Longitude in the correct numerical format. The underlying value doesn't matter in this specific check. Also this is a 3rd party database, which I have no control over column types. – Parrish Husband Feb 28 '20 at 17:49
  • 3
    Float makes it even less sensical as `SELECT CAST(CAST(31.1111111 AS float) AS decimal(38,36))` returns `31.111111099999998685916580143384635448`. You would have no way of distinguishing between `31.111111099999998685916580143384635448` and `31.1111111` anyway – Martin Smith Feb 28 '20 at 17:51
  • Does this answer your question? [How to validated geography polygon in SQL server 2008?](https://stackoverflow.com/questions/34715002/how-to-validated-geography-polygon-in-sql-server-2008) Since you are trying to test valid longitude and latitude values, it would make sense to use geography types. – Jacob H Feb 28 '20 at 18:14
  • Actually this is probably a better post that isn't specific to 2008: https://gis.stackexchange.com/questions/66642/detecting-invalid-wkt-in-text-column-in-sql-server – Jacob H Feb 28 '20 at 18:17

2 Answers2

2

What you want to do does not make sense in SQL Server.

Oracle supports a number data type that has a variable precision:

if a precision is not specified, the column stores values as given.

There is no corresponding data type in SQL Server. You have have a variable number (float/real) or a fixed number (decimal/numeric). However, both apply to ALL values in a column, not to individual values within a row.

The closest you could do is:

where somefield >= 0 and somefield < 100

Or if you wanted to insist that there is a decimal component:

where somefield >= 0 and somefield < 100 and floor(somefield) <> somefield

However, you might have valid integer values that this would filter out.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Your first suggestion I did initially try with the customer, however this did not satisfy the formatting requirements. I agree the data type makes this less than ideal. – Parrish Husband Feb 28 '20 at 18:02
  • Not "less than ideal" - not even possible given the imprecise nature of float. If you want to validate the format that people enter this would need to be stored as a string as float is imprecise and even decimal won't distinguish between `50.0000000` and `50` being entered. Though it should also be stored as a number datatype for all other purposes – Martin Smith Feb 28 '20 at 18:07
  • @MartinSmith I appreciate the wisdom here. I am going to relay all of this information along with the extremely bad and hacky workaround to let them know this is a bad idea in the first place, and is an unreliable check on this data. – Parrish Husband Feb 28 '20 at 18:47
0

This answer gave me an option that works in conjunction with checking the decimal position first.

SELECT SomeField 
FROM SomeTable
WHERE SomeField IS NOT NULL
  AND CHARINDEX('.', SomeField ) = 3
  AND LEN(CAST(CAST(REVERSE(CONVERT(VARCHAR(50), SomeField , 128)) AS FLOAT) AS BIGINT)) = 7

While I understand this is terrible by nearly all metrics, it satisfies the requirements.

The basis of checking formatting on this data type in inherently flawed as pointed out by several posters, however for this very isolated use case I wanted to document the workaround.

Parrish Husband
  • 3,148
  • 18
  • 40