3

I have huge data and as a result I cannot hold all of it in memory and I always get out of memory errors; obviously one of the solutions would be using streaming in Node.JS; but streaming is not possible(as far as I know) with sorting which is one the functionalities which I apply on my data; is there any algorithm maybe Divide and conquer algorithm that I can use for the combination of streaming and sorting (which is one of the functionalities which I apply on my data?)

user385729
  • 1,924
  • 9
  • 29
  • 42
  • Depends on the data you're going to show. Is it displayed in a list? Why not to use pagination? – Kirill Slatin Jun 15 '15 at 02:14
  • 1
    @KirillSlatin It is an array of objects! As I said I cannot hold all of the data in memory and for sorting (at least I would say based on the naive algorithm) you need to have all of the data in memory! It does not matter whether you use pagination or not; in other words sorting needs all of the data and you cannot apply sorting with just portions of data (thats my question, is it possible? is there any algorithm to apply sorting with portions of data, either by using streaming or paginated results?) – user385729 Jun 15 '15 at 02:31
  • 1
    Where are you getting this excessive amount of data from? Database? I guess you don't really understand the concept of pagination... The request to server should contains sorting fields, orders of them, page number and page size. Server sorts and returns a fraction of result – Kirill Slatin Jun 15 '15 at 02:41
  • Yes Database and sorting is not supported from the database I use; I handle it in the application as opposed to database level! – user385729 Jun 15 '15 at 02:42
  • 2
    If your DB doesn't support sorting and server is not capable of reading all data (which is obviously required for sorting) the only thing you can do is 1. buy a bigger server (bad idea if your DB grows) and 2. switch to a database that supports sorted queries – Kirill Slatin Jun 15 '15 at 02:44
  • 1
    Where is the data coming from and where is it going to? There are [dedicated sorting algorithms](https://en.wikipedia.org/wiki/Sorting_algorithm#Memory_usage_patterns_and_index_sorting) for data that doesn't fit in memory. – Bergi Jun 15 '15 at 02:44
  • 1
    @Pasargad: Please be more specific about the database you are using. What exactly does it not support? Also this sounds like you have chosen the wrong database engine if large amounts of sorted data are crucial for your app. This should not be handled at the application layer. – Bergi Jun 15 '15 at 02:47
  • @Bergi, first of all it impacts performance greatly as stated in the linked article, and most likely the app being built is a business app and not a scientific one. So messing around with rather low-level algorithmic problems should be avoided and replaced with a correct technology choice – Kirill Slatin Jun 15 '15 at 02:50
  • 1
    Just to point out that there is such a thing as an out of memory, or on disk, sort command. On linux there is http://linux.about.com/library/cmd/blcmdl1_sort.htm. There is a command line out of memory sort command on Windows, too, but I am less familiar with that. – mcdowella Jun 15 '15 at 04:42

1 Answers1

1

You can stream the data using Kinesis and use the Kinesis Client Library, or subscribe a Lambda function to your Kinesis stream and incrementally maintain sorted materialized views. Where you store your sorted materialized views and how you divide your data will depend on your application. If you cannot store the entire sorted materialized views, you could have rolling views. If your data is time-series, or has some other natural order, you could divide the range of your ordered attribute into chunks. Then, you could have for example, 1-day or 1-hour sorted chunks of your data. In other words, choose the sorted subdivision that allows you to keep the information in memory as needed.

Alexander Patrikalakis
  • 5,054
  • 1
  • 30
  • 48
  • Thanks so much for introducing this cool service of Amazon! So the main problem is that I cannot hold this big sorted materialized views in memory and there is no point to hold it in memory either; if I understood correctly, to use Kinesis still I need to hold the sorted result in memory, which incrementally gets bigger and bigger, is that right? Did I understand correctly? The desired solution would be having sorted result from a service like Kinesis and stream it to response as it is coming without holding it in memory... Your help is appreciated... – user385729 Jun 24 '15 at 17:21
  • I updated my answer to address the question of chunking up sorted materialized views. – Alexander Patrikalakis Jun 24 '15 at 20:36