0

Is there a faster way to create view on CouchDB? My data is something like this:

{"docs":[{
        "c_custkey": 1,
        "c_name": "Customer#000000001",
        "c_address": "IVhzIApeRb",
        "c_city": "MOROCCO  0",
        "c_nation": "MOROCCO",
        "c_region": "AFRICA",
        "lineorder": [{
                    "lo_orderkey": 164711,
                    "lo_linenumber": 1,
                    "lo_custkey": 1,
                    "lo_partkey": 82527,
                    "lo_suppkey": 1848,
                    "lo_quantity": 34,
                    "lo_extendedprice": 5132368,
                    "lo_revenue": 2816872,
                    "orderdate": [{
                        "d_datekey": 19920426,
                        "d_date": "April 26, 1992",
                        "d_dayofweek": "Monday",
                        "d_month": "April",
                        "d_year": 1992,
                        "d_yearmonthnum": 199204,
                    }],
                    "part": [{
                        "p_partkey": 82527,
                        "p_name": "steel tomato",
                        "p_mfgr": "MFGR#4",
                        "p_category": "MFGR#45",
                        "p_brand1": "MFGR#452",
                    }],
                    "supplier": [{
                        "s_city": "MOZAMBIQU8",
                        "s_nation": "MOZAMBIQUE",
                        "s_region": "AFRICA",
                    }]
                }, {
                    "lo_orderkey": 164711,
                    "lo_linenumber": 2,
                    "lo_custkey": 1,
                    "lo_partkey": 26184,
                    "lo_suppkey": 1046,
                    "lo_orderdate": 19920426,
                    "lo_quantity": 15,
                    "lo_extendedprice": 1665270,
                    "orderdate": [{
                        "d_datekey": 19920426,
                        "d_date": "April 26, 1992",
                        "d_dayofweek": "Monday",
                        "d_month": "April",
                        "d_year": 1992,
                        "d_yearmonthnum": 199204,
                    }],
                    "part": [{
                        "p_partkey": 26184,
                        "p_name": "chartreuse green",
                        "p_mfgr": "MFGR#2",
                        "p_category": "MFGR#23",
                        "p_brand1": "MFGR#2329",
                    }],
                    "supplier": [{
                        "s_suppkey": 1046,
                        "s_city": "SAUDI ARA2",
                        "s_nation": "SAUDI ARABIA",
                        "s_region": "MIDDLE EAST",
                    }]
                },...

And I'm creating the view this way using FUTON, but it takes 30 min:

Map function:

function(doc) 
{ 
var c_city=doc.c_city
var c_nation=doc.c_nation
if (c_nation=="UNITED STATES"){
   for each (lineorder in doc.lineorder) { 
      for each (supplier in lineorder.supplier){
        var s_city=supplier.s_city
        var s_nation=supplier.s_nation
      }
      if (s_nation=="UNITED STATES"){
      for each (orderdate in lineorder.orderdate) {
        var d_year=orderdate.d_year
      }
      if (d_year>=1992 && d_year<=1997){
       emit({d_year:d_year,c_city:c_city,s_city:s_city},lineorder.lo_revenue); 
      }
      }   
   }
}
}

Reduce Function: "_sum"

My database have 2 GB of this kind of data.

Raphael
  • 99
  • 10
  • 3
    Building a view on a large set of data can take quite some time. Once the view is built, updates will be very quick. You shouldn't need to be building views for ad hoc queries. Also is this really your map function? – Kerr Mar 08 '16 at 22:44
  • Thanks about your answer. I've notice about the time. What I really wanna know is if there's another way to make this map function to create the views faster. My map function works, but I dont know if there's another best than that. – Raphael Mar 10 '16 at 01:02

1 Answers1

0

This is obviously not your real view (lineorder vs. lineorders and there's no lo_revenue), so I won't waste time finessing it. Instead let me say that for a 2GB data set with who knows how many lineorders iterations per document, 30 minutes is not at all surprising.

smathy
  • 26,283
  • 5
  • 48
  • 68
  • I'm sorry about that. I've deleted somefields to post a smaller sample of my data and the lineorders was a typing error. The data has just lineorder. (both correct now). Thanks for your answer. I've notice the time was not suprising, I'm new on couchDB and I ask here to know if there's some other way to make the map function that decrease the time of view creation. – Raphael Mar 10 '16 at 00:55
  • Not with the same functionality, no. Why do you care? You only create the view once. – smathy Mar 10 '16 at 01:29
  • Because I'ḿ testing some system to use them on a data warehouse. And I'm trying to compare them correctly. – Raphael Mar 10 '16 at 01:39
  • Ok, so yes, this sounds about right. Couch trades off the initial view/index creation time for better insert/update/delete times down the track. – smathy Mar 10 '16 at 02:36