0

My scheme

/geomesa-accumulo describe-schema -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder
INFO  Describing attributes of feature 'SignalBuilder'
geo           | Point   (Spatio-temporally indexed) (Spatially indexed)
time          | Date    (Spatio-temporally indexed) (Attribute indexed)
cam           | String  (Attribute indexed) (Attribute indexed)
imei          | String  
dir           | Double  
alt           | Double  
vlc           | Double  
sl            | Integer 
ds            | Integer 
dir_y         | Double  
poi_azimuth_x | Double  
poi_azimuth_y | Double  

User data:
  geomesa.attr.splits     | 4
  geomesa.feature.expiry  | time(30 days)
  geomesa.index.dtg       | time
  geomesa.indices         | z3:7:3:geo:time,z2:5:3:geo,attr:8:3:time,attr:8:3:cam,attr:8:3:cam:time
  geomesa.stats.enable    | true
  geomesa.table.partition | time
  geomesa.z.splits        | 4
  geomesa.z3.interval     | week

When I try to get count by stat methods it retuns 11:

./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder -q "cam='9f471340-dd70-4eca-a8dc-14553a4e708a'"           
Estimated count: 11

but without cache:

./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder -q "cam='9f471340-dd70-4eca-a8dc-14553a4e708a'" --no-cache
INFO  Running stat query...
Count: 1436

Why stats methods not worked properly and return only estimated value?

In redis it's all ok. The problem is only in accumulo.

** Question update:

I try to recalculate statistics

 ~/bin/geomesa-accumulo_2.12-3.2.2/bin  ./geomesa-accumulo stats-analyze -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder                                                
INFO  Running stat analysis for feature type SignalBuilder...
INFO  Stats analyzed:
  Total features: 11527
  Bounds for geo: [ 37.598007, 55.736623, 38.661036, 56.9189592 ] cardinality: 10634
  Bounds for time: [ 2022-01-30T15:13:58.706Z to 2022-02-09T14:16:03.000Z ] cardinality: 3779
  Bounds for cam: [ 3fe961e1-91dd-4931-b82e-d04fcaf24c3e to f767f0fa-dac5-4571-aa47-1ea6bf6e2c82 ] cardinality: 6
INFO  Use 'stats-histogram', 'stats-top-k' or 'stats-count' commands for more details
 ~/bin/geomesa-accumulo_2.12-3.2.2/bin  ./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder -q "cam='9f471340-dd70-4eca-a8dc-14553a4e708a'"           
Estimated count: 14
 ~/bin/geomesa-accumulo_2.12-3.2.2/bin  ./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder -q "cam='3fe961e1-91dd-4931-b82e-d04fcaf24c3e'"
Estimated count: 0
 ~/bin/geomesa-accumulo_2.12-3.2.2/bin  ./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder -q "cam='3fe961e1-91dd-4931-b82e-d04fcaf24c3e'" --no-cache
INFO  Running stat query...
Count: 2675
 ~/bin/geomesa-accumulo_2.12-3.2.2/bin  ./geomesa-accumulo stats-analyze -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder                                              
INFO  Running stat analysis for feature type SignalBuilder...
INFO  Stats analyzed:
  Total features: 11767
  Bounds for geo: [ 37.598007, 55.736623, 38.661036, 56.9189592 ] cardinality: 10942
  Bounds for time: [ 2022-01-30T15:13:58.706Z to 2022-02-09T14:17:41.000Z ] cardinality: 3841
  Bounds for cam: [ 3fe961e1-91dd-4931-b82e-d04fcaf24c3e to f767f0fa-dac5-4571-aa47-1ea6bf6e2c82 ] cardinality: 6
INFO  Use 'stats-histogram', 'stats-top-k' or 'stats-count' commands for more details
 ~/bin/geomesa-accumulo_2.12-3.2.2/bin  ./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder -q "1=1"                                  
Estimated count: Unknown
Re-run with --no-cache to get an exact count
 ~/bin/geomesa-accumulo_2.12-3.2.2/bin  ./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder -q "1=1" --no-cache
INFO  Running stat query...
Count: 11872

But it does not help (((. The geo-events continue to arrive to geomesa. But stats does not worked.

May by I'm not using stats-count properly. Stats-top-k shows gathered statistics.

 ~/bin/geomesa-accumulo_2.12-3.2.2/bin  ./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder -q "cam like '3fe961e1-91dd-4931-b82e-d04fcaf24c3e'"
Estimated count: 0
 ~/bin/geomesa-accumulo_2.12-3.2.2/bin  ./geomesa-accumulo stats-top-k -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder                                               
Top values for 'geo':
  unavailable
Top values for 'time':
  unavailable
Top values for 'cam':
  7c0cf8bc-e7e3-4023-8a00-a5f17bda3001 (2925)
  9f471340-dd70-4eca-a8dc-14553a4e708a (2924)
  f767f0fa-dac5-4571-aa47-1ea6bf6e2c82 (2922)
  bfe55ad1-5b0a-405d-9ca9-3bed6aca9313 (2921)
  3fe961e1-91dd-4931-b82e-d04fcaf24c3e (2920)
  5798a065-d51e-47a1-b04b-ab48df9f1324 (2)
Top values for 'imei':
  unavailable
Top values for 'dir':
  unavailable
Top values for 'alt':
  unavailable
Top values for 'vlc':
  unavailable
Top values for 'sl':
  unavailable
Top values for 'ds':
  unavailable
Top values for 'dir_y':
  unavailable
Top values for 'poi_azimuth_x':
  unavailable
Top values for 'poi_azimuth_y':
  unavailable

Or maybe the reason was in accumulo. When I try to get data from accumulo table. It returns

root@accumulo> scan -t myNamespace.geomesa_SignalBuilder_z3_geo_time_v7_02717
2022-02-09 17:55:12,909 [commands.ShellPluginConfigurationCommand] ERROR: Error: Could not determine the type of file "hdfs://10.200.217.27:9000/accumulo/classpath/myNamespace/[^.].*.jar".
2022-02-09 17:55:12,909 [shell.Shell] ERROR: Could not load the specified formatter. Using the DefaultFormatter
2022-02-09 17:55:12,929 [commands.ShellPluginConfigurationCommand] ERROR: Error: Could not determine the type of file "hdfs://10.200.217.27:9000/accumulo/classpath/myNamespace/[^.].*.jar".
\x01\x0A\x9Dt\x19\x84\xEF\xDD\xAF "5798a065-d51e-47a1-b04b-ab48df9f1324-1643555638706 d: []    \x03\x00\x0C\x02\x00\x1E\x000\x008\x00\\\x00g\x00o\x00w\x00\x7F\x00\x83\x00\x87\x00\x87\x00\x87\x00\x87\x00\x00\x0E\x00\x01\x01@CT\x9C\xD3\xE0\xBDE@Lu\xA0t\x7F-\xDE\x00\x00\x01~\xAB\x8C\xCD\xB25798a065-d51e-47a1-b04b-ab48df9f132\xB43333333333\xB1@f@\x00\x00\x00\x00\x00?\xF3\xAE\x14z\xE1G\xAE\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x05\x00\x00\x00\x01
\x01\x0A\x9Dt\x19\x84\xEF\xDD\xBD!\x065798a065-d51e-47a1-b04b-ab48df9f1324-1643555648706 d: []    \x03\x00\x0C\x02\x00\x1E\x000\x008\x00\\\x00g\x00o\x00w\x00\x7F\x00\x83\x00\x87\x00\x87\x00\x87\x00\x87\x00\x00\x0E\x00\x01\x01@CT\x9C\xD3\xE0\xBDE@Lu\xA0t\x7F-\xDE\x00\x00\x01~\xAB\x8C\xF4\xC25798a065-d51e-47a1-b04b-ab48df9f132\xB43333333333\xB1@f@\x00\x00\x00\x00\x00?\xF3\xAE\x14z\xE1G\xAE\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x05\x00\x00\x00\x01
Christopher
  • 2,427
  • 19
  • 24
r.gomboev
  • 73
  • 5

1 Answers1

0

Stats are gathered during ingestion, but are only written on a "best effort" basis (for example, if your ingest dies, stats may not be written). There are also code paths that don't update stats, for example if you disable them via system property or if you ingest through a bulk map/reduce job. In your particular case, it's hard to say why your stats don't match your data without a detailed description of everything you did to ingest it. However, if you want to re-calculate the cached statistics, you can always run the stats-analyze CLI command.

If you can re-create the issue, please feel free to file a ticket in the GeoMesa JIRA with the steps to re-create.

Emilio Lahr-Vivaz
  • 1,439
  • 6
  • 5
  • Please look at question update. I try to recalculate the stats. But the result is the same. Can you explain how to enable stats. Which system property I can toggle to turn on stats? – r.gomboev Feb 09 '22 at 14:24
  • The property is described here: https://www.geomesa.org/documentation/stable/user/datastores/runtime_config.html#geomesa-stats-generate – Emilio Lahr-Vivaz Feb 10 '22 at 00:48
  • It should be using a CountMinSketch for estimating that stat. Possibly having 'cam' indexed twice is causing an error. You might try deleting the 'cam' index and just keep the 'cam:time' index, as the 'cam' only index is redundant. – Emilio Lahr-Vivaz Feb 10 '22 at 00:50