1

i am new to handling a lot of data.

Every 100ms i write actually 4 json blocks to my arangodb in a collection.

the content of the json ist something like that:

{
  "maintenence": {
    "holder_1": 1,
    "holder_2": 0,
    "holder_3": 0,
    "holder_4": 0,
    "holder_5": 0,
    "holder_6": 0
  },
  "error": 274,
  "pos": {
    "left": [
      21.45, // changing every 100ms
      38.36, // changing every 100ms
      10.53 // changing every 100ms
    ],
    "center": [
      0.25, // changing every 100ms
      0, // changing every 100ms
      2.42 // changing every 100ms
    ],
    "right": [
      0, // changing every 100ms
      0, // changing every 100ms
      0 // changing every 100ms
    ]
  },
  "sub": [
    {
      "type": 23,
      "name": "plate 01",
      "sensors": [
        {
          "type": 45,
          "name": "sensor 01",
          "state": {
            "open": 1,
            "close": 0,
            "middle": 0
          }
        },
        {
          "type": 34,
          "name": "sensor 02",
          "state": {
            "on": 1
          }
        }
      ]
    }
  ],
  "timestamp": "2018-02-18 01:56:08.423",
  "device": "12227225"
}

every block is another device

In only 2 days there are ~6 million of datasets in the collection.

if i want to get data to draw a line graph from "device 1 position left[0]"

with:

FOR d IN device
FILTER d.timestamp >= "2018-02-18 04:30:00.000" && d.timestamp <= "2018-02-18 04:35:00.000"
RESULT d.pos.left[0]

It tooks a veeeeeery long time so search in this ~6 million datasets.

My question is: is this normal and only machine power can fix this problem or is my way to handle this set of data wrong?

I think ~6 million datasets is not BIG DATA, but i think if i fail with this, how can i handle this if i add 50 more devices collect it not 2 days but 30 days.

Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
mok liee
  • 217
  • 2
  • 10
  • I believe your search is taking a lot of time because you are using strings. You may want to convert it to a real timestamp to make the search much faster. If it is still slow, you can make a custom index containing the date, something like (20180218-id) and search for regular expression (which may be the ultimate speedster here). – Israel Zinc Feb 20 '18 at 07:57
  • Please run you query through the "Explain" feature on the frontend. Your performance is affected by a lot of different things. The main factor would be indexes.and how you use the field timestamp. Is there a reason, why you would like to have human readable timestamps as @israel.zinc suggests? – Kaveh Vahedipour Feb 20 '18 at 08:45
  • Did that work for you? – Kaveh Vahedipour Feb 22 '18 at 09:44
  • Sorry, for late answer. – mok liee Feb 22 '18 at 10:00
  • Now i write the timestamp as unix timestamp (number). it is much faster than before. thank u. actually i search 2 min. in a document with ~3.5 million datasets, and the result is ~15000 datasets and comes in 10 sec. The server is a VM with 8GB RAM and 8 Cores 2Ghz. My First Query with ~ 1 million datasets was 1 sec. how can i prevent that my query slow down if there are 10 million datasets? do i have to split or archive some data or is there a technique for problems like mine? – mok liee Feb 22 '18 at 10:14

1 Answers1

1

converting the timstamps to unix timestamp (number) helps alot.

i added a skiplist index over timestamp & device.

Now, with 13 million datatsets my query runs 920ms.

Thank u!

mok liee
  • 217
  • 2
  • 10