I am putting asynchronous csv stream data from each URL into each file one after another like below.
async with httpx.AsyncClient(headers={"Authorization": 'Token token="sometoken"'}) as session:
for url in some_urls_list:
download_data(url, session)
@backoff.on_exception(backoff.expo,exception=(httpx.SomeException,),max_tries=7,)
async def download_data(url, session):
while True:
async with session.stream("GET", url) as csv_stream:
csv_stream.raise_for_status()
async with aiofiles.open("someuniquepath", "wb") as f:
async for data in csv_stream.aiter_bytes():
await f.write(data)
break
I am ingesting this data into Splunk via inputs.conf
and props.conf
as below.
[monitor:///my_main_dir_path]
disabled = 0
index = xx
sourcetype = xx:xx
[xx:xx]
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+)
NO_BINARY_CHECK = true
CHARSET = UTF-8
INDEXED_EXTRACTIONS = csv
TIMESTAMP_FIELDS = xx
I am getting several issues in this as below.
- Some files are not indexed at all.
- From some files only partial rows are indexed.
- Some rows are abruptly divided into 2 events on Splunk.
What could be done on the Splunk configuration side to solve above issues while taking care that it does not cause any duplicate data indexing issue?
Sample Data: (First line is the header.)
A,B B,C D,E,F,G H?,I J K,L M?,N/O P,Q R S,T U V (w x),Y Z,AA BB,CC DD,EE FF,GG HH,II JJ KK,some timestamp field,LL,MM,NN-OO,PP?,QQ RR ss TT UU,VV,WW,XX,YY,ZZ,AAA BBB,CCC,DDD-EEE,FFF GGG,HHH,III JJJ,KKK LLL,MMM MMM,NNN OOO,PPP QQQ,RRR SSS 1,TTT UUU 2,VVV WWW 3,XX YYY,ZZZ AAAA,BBBB CCCC
adata@adata.adata,"bbdata, bbdata",ccdata ccdata,eedata eedata - eedata,ffdata - ffdata - 725 ffdata ffdata,No,,No,,,,,unknown,unknown,unknown,2.0.0,"Sep 26 22:40:18 iidata-iidata-12cb65d081f745a2b iidata/iidata[4783]: iidata: to=<iidata@iidata.iidata>, iidata=iidata.iidata.iidata.iidata[111.111.11.11]:25, iidata=0.35, iidata=0.08/0/0.07/0.2, iidata=2.0.0, iidata=iidata (250 2.0.0 OK 1569537618 iidata.325 - iidata)",9/26/2019 22:40,,,,,,,wwdata,xxdata,5,"zzdata, zzdata",aaadata aaadata aaadata,cccdata - cccdata,ddddata - ddddata,fffdata,hhhdata,25/06/2010,6,2010,"nnndata nnndata nnndata, nnndata.",(pppdata'pppdata) pppdata pppdata,,,,303185,,
Sample Broken Event:
adata@adata.adata,"bbdata, bbdata",ccdata ccdata,eedata eedata - eedata,ffdata - ffdata - 725 ffdata ffdata,No,,No,,,,,unknown,un
known,unknown,2.0.0,"Sep 26 22:40:18 iidata-iidata-12cb65d081f745a2b iidata/iidata[4783]: iidata: to=<iidata@iidata.iidata>, iidata=iidata.iidata.iidata.iidata[111.111.11.11]:25, iidata=0.35, iidata=0.08/0/0.07/0.2, iidata=2.0.0, iidata=iidata (250 2.0.0 OK 1569537618 iidata.325 - iidata)",9/26/2019 22:40,,,,,,,wwdata,xxdata,5,"zzdata, zzdata",aaadata aaadata aaadata,cccdata - cccdata,ddddata - ddddata,fffdata,hhhdata,25/06/2010,6,2010,"nnndata nnndata nnndata, nnndata.",(pppdata'pppdata) pppdata pppdata,,,,303185,,