0

My Elasticsearch index has more than 1000 fields due to my Sql schema and I get below exception:

{'type': 'illegal_argument_exception', 'reason': 'Limit of total fields [1000] in index }

And my bulk insert looks like this:

with open('audit1.txt') as file:
    for line in file:
        columns = line.split(r'||')
        dict['TimeStamp']=columns[0].strip('\'')
        dict['BusinessTimeStamp']=columns[1].strip('\'')
        dict['RuntimeMicroflowID']=columns[2].strip('\'')
        dict['MicroflowID']=columns[3].strip('\'')
        dict['UserId']=columns[4].strip('\'')
        dict['ClientId']=columns[5].strip('\'')
        dict['Userlocation']=columns[6].strip('\'')
        dict['Transactionid']=columns[7].strip('\'')
        dict['Catagorie']=columns[8].strip('\'')
        dict['EventType']=columns[9].strip('\'')
        dict['Operation']=columns[10].strip('\'')
        dict['PrimaryData']=columns[11].strip('\'')
        dict['SecondayData']=columns[12].strip('\'')
        i=13
        while i < len(columns):
            tempdict['BFOLDVALUE'] = columns[i+1].strip('\'')
            tempdict['BFNEWVALUE'] = columns[i+2].strip('\'')
            if columns[i].strip('\'') is not None:
                dict[columns[i].strip('\'')] = tempdict.copy()
            i+=3
            tempdict.clear()
        #print(json.dumps(dict,indent = 4))
        batch.append(dict)
        if counter==BATCHSIZE:
            try:
                helpers.bulk(es, batch, index='audit-index', doc_type='audit')
                insertedrecords+=counter
                counter = 0
                batch.clear()
                print(insertedrecords," - Records Has Been inserted ")
            except BulkIndexError:
                print("Error Occured -- continuing")
                print(json.dumps(dict,indent = 4))
                print(BulkIndexError)
                batch.clear()
                break
        counter+=1
        dict.clear()

So, I am assuming I am trying to index this wrongly... is there a better way of indexing this kind of formats in elasticsearch? Note than I am using ELK version 7.5.

Here is the sample file I am parsing to elasticsearch:

2018.07.17/15:41:53.735||2018.07.17/15:41:53.735||'0164a8424fbbp84h%2139165'||'BT_TTB_CashDep_PRC'||'eskedarz'||'UXP'||'00001039'||'0164a842e519pJpA'||'Persistence'||''||'CREATE'||'DailyTxns'||'0164a842e4eapJnu'||'CurrentThread'||'WebContainer : 15'||''||'ParentThread'||'system'||''||'TCPWorkerThreadID'||'WebContainer : 15'||''||'f_POSTINGDT'||'2018-07-17'||''||'versionNum'||'0'||''||'f_TXNAMTDR'||'0'||''||'f_ACCOUNTID'||'013XXXXXXXXX0'||''||'f_VALUEDTTM'||'2018-07-17 15:41:53.0'||''||'f_POSTINGDTTM'||'2018-07-17 15:41:53.692'||''||'f_TXNCLBAL'||'25551.610000'||''||'f_TXNREF'||'0000103917071815410685326'||''||'f_PIEVENTTYPE'||'N'||''||'f_TXNAMT'||'5000.00'||''||'f_TRANSACTIONID'||'0164a842e4e9pJng'||''||'f_TYPE'||'N'||''||'f_USERID'||'xxxarz'||''||'f_SRNO'||'1'||''||'f_TXNBASEEQ'||'5000.00'||''||'f_TXNSRCBRANCH'||'0000X039'||''||'f_TXNCODE'||'T08'||''||'f_CHANNELID'||'BranchTeller'||''||'f_TXNAMTCR'||'5000.00'||''||'f_TXNNARRATION'||'SELF                                                                                      '||''||'f_ISACCRUALPENDING'||'false'||''||'f_TXNDTTM'||'2018-07-17 15:41:53.689'||''
tshepang
  • 12,111
  • 21
  • 91
  • 136
Serak Shiferaw
  • 993
  • 2
  • 11
  • 32
  • Can you share the document that you are trying to index? One of the items in your `batch` list. At a first look you doesn't seem to have that many fields to reach the field limit. How many different itens do you have in the key column table 2? – leandrojmp Apr 25 '20 at 16:34
  • since its an audit log of application its might pass a 1000 field but the like 95% of it is reapeating column with KEY|oldvalue|newvalue – Serak Shiferaw Apr 27 '20 at 07:20
  • @OpsterElasticsearchNinja i did not get the answer what i am looking for may be elasticsearch is not that good for document store. mongodb works fine for my requirement but i have to index every column and that's just silly. i even tried nesting but same error. may be i should stick to columnstore relational database. so i can store the KV vertically instead of horizontally ( row instead of column ). – Serak Shiferaw May 06 '20 at 11:06
  • @SerakShiferaw, but initial your question was about the error of `Limit of total fields` which should have been resolved by the steps given in answer. regarding your comment, yeah in your case you are having way many fields which is not normal in search and analytics use-cases and i guess cassandara like `columnstore` DB would be better fit for this use-case – Amit May 06 '20 at 13:04
  • @OpsterElasticsearchNinja i have tried that also now i am having `'status': 400, 'error': {'type': 'illegal_argument_exception', 'reason': 'Limit of total fields [1500] in index [audit-index] has been exceeded'}` – Serak Shiferaw May 06 '20 at 14:28
  • @SerakShiferaw, did you increase it t0 1500? – Amit May 06 '20 at 14:39

2 Answers2

1

if you carefully look at this part of the error message it would be clear.

Limit of total fields [1000] in index

1000 is the default limit of total fields in the Elasticsearch index as shown in their source code.

public static final Setting<Long> INDEX_MAPPING_TOTAL_FIELDS_LIMIT_SETTING =
        Setting.longSetting("index.mapping.total_fields.limit", 1000L, 0, Property.Dynamic, Property.IndexScope);

Please note this is a dynamic setting, hence can be changed on a given index, by updating index setting

PUT test_index/_settings
{
  "index.mapping.total_fields.limit": 1500. --> changed it to what is suitable for your index.
}

More info on this issue can be found here and here.

Amit
  • 30,756
  • 6
  • 57
  • 88
  • is there a problem if i set it to 500,000 ? because i think 1500 is not enough, it failed with `'status': 400, 'error': {'type': 'illegal_argument_exception', 'reason': 'Limit of total fields [1500] in index [audit-index] has been exceeded'}` – Serak Shiferaw May 06 '20 at 14:30
  • @SerakShiferaw yeah that's a huge number and would cause severe unexpected issues, please see my comment on the question, I guess better you ask a seprate design question with more detail, this was specific to ES error and let's focus on this to make it useful for the community – Amit May 06 '20 at 14:41
  • @SerakShiferaw, I also modified the question heading and context so that people reach to this post when they search for the exception you are facing. – Amit May 06 '20 at 14:44
  • its more like a design issue.. i need to nest those key value somewhere deepdown and update the depth limit might help – Serak Shiferaw May 06 '20 at 18:43
0

better way to handle such exploding index is to normalize as RDBMS that means store some of the key : value combinations in a nested structure

example

{"keyA":"ValueA","keyB":"ValueB","keyC":"ValueC"...} - record to 

{"keyA":"ValueA","Keyvalue":{"keyB":"ValueB"   
                 "keyC":"ValueC"}} - record 

so searching would look like Keyvalue.Value == KeyB and KeyValue.Value = ValueB

Serak Shiferaw
  • 993
  • 2
  • 11
  • 32