In spark2.0.1 ,hadoop2.6.0, I have many files delimited with '!@!\r' and not with the usual new line \n,for example:
=========================================
2001810086 rongq 2001 810!@!
2001810087 hauaa 2001 810!@!
2001820081 hello 2001 820!@!
2001820082 jaccy 2001 820!@!
2002810081 cindy 2002 810!@!
=========================================
I try to extracted data according to Setting textinputformat.record.delimiter in spark
set textinputformat.record.delimiter='!@!\r';
or set textinputformat.record.delimiter='!@!\n
';but still cannot extracted the data
In spark-sql,I do this : ===== ================================
create table ceshi(id int,name string, year string, major string)
row format delimited
fields terminated by '\t';
load data local inpath '/data.txt' overwrite into table ceshi;
select count(*) from ceshi;
the result is 5,but I try to set textinputformat.record.delimiter='!@!\r'
; then select count(*) from ceshi;
the result is 1, the delimiter donot work well;
I also check the source of hadoop2.6.0, the method of RecordReader in TextInputFormat.java,I notice that default textinputformat.record.delimiter is null,then the the LineReader.java use the method readDefaultLine to read a line terminated by one of CR, LF, or CRLF(CR ='\r',LF ='\n').