0

There is a table with three columns id, VNS, geometry and several tens of thousands of records. The values in geometry are a string of letters and numbers (mostly floats with 9 decimal places) with a length of 5-20 thousand characters. The whole idea is to find matches in geometry and output the corresponding VNS values as a group.

I used

Select group_concat(VNS),
             `geometry` ,
         count(*) AS count 
from geo_table 
Group by geometry 
HAVING count > 1 
order by count DESC;

But I ran into the problem of inaccurate comparison of these lines. The search for matches works well, but not very accurately, the impression is that it compares only a part of the file, maybe half or 70%, but not 100%. I thought that the problem was in floats and tried replacing the dots with another symbol - it did not help. I tried to encode base64_encode and compare - the effect is the same. Maybe there is some more correct way to compare long strings or the settings in the database (MariaDB) are wrong?

-- phpMyAdmin SQL Dump
-- version 5.1.1
-- Хост: 127.0.0.1
-- Час створення: Жов 04 2021 р., 10:15
-- Версія сервера: 10.4.21-MariaDB
-- Версія PHP: 8.0.10

SET SQL_MODE = "NO_AUTO_VALUE_ON_ZERO";
START TRANSACTION;
SET time_zone = "+00:00";


CREATE TABLE `geo_table` (
  `id` int(11) NOT NULL,
  `VNS` varchar(30) NOT NULL,
  `geometry` mediumtext NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;


INSERT INTO `geo_tab_vut` (`id`, `VNS`, `geometry`) VALUES
(1, '902.12.162-01', 'I35UVElORk9fRU5EDQojfjMxDQpQDQoxDQowIzAwMDAwMDAwMDE2IzUyNTA2NDUzMDAjMDAwMD.. 

I cann't post full size insert query because it is too long.

Shadow
  • 33,525
  • 10
  • 51
  • 64
Pavlo N
  • 83
  • 1
  • 11
  • Questions should include a [minimal, reproducible](https://stackoverflow.com/help/mcve) example, which, for questions about SQL, means the DDL & DML statements to create & populate table(s). Also, it should include samples of desired and actual results. – outis Oct 04 '21 at 08:06
  • You tell that *The values in geometry are a string of letters and numbers (mostly floats with 9 decimal places)* - but shown starting part of the value *I35UVElORk9fRU5EDQojfjMxDQpQDQoxDQowIzAwMDAwMDAwMDE2IzUyNTA2NDUzMDAjMDAwMD..* does not match this description (for example, I do not see any float number). – Akina Oct 04 '21 at 08:44
  • @Akina Yes and then I decided to encode it by `base64_encode` and it became more simple string to overview. It has not change the result. – Pavlo N Oct 04 '21 at 08:52
  • You really should not group by a text field - see the accepted answer in the duplicate question for details. If geometry field contains some kind of an GIS data, then you should probably be using mysql's / mariadb/s GIS functionality for grouping. – Shadow Oct 04 '21 at 08:55
  • I recommend you to add a column which contains some hash value of this long string. This must allow you to perform fast compare (in subquery/cte), Then, in outer query, you may perform complete and detailed compare. From my practice - the comparing of 64kb strings (rather than sorting/grouping) does not meet with any problem. – Akina Oct 04 '21 at 08:57

0 Answers0