0

Column names in input file:

UserID UserName DateCreated TimeCreated SessionID SessionName SessionAddress SessionStatus Host DeviceID DeviceName DeviceVersion DataUsed DataSent DeviceStatus   

Content of the file input.txt:

string 08:John Doe 2016-05-31 23:55:45.678 0e:9999999999.999 string 0f:123.456.789.99 06:active 0f:123.456.789.111 0a:1234567890 samsung 0a:AA 00.12.1 0022a 0022b 06:active
string 09:Blah Blah 2016-05-31 23:57:05.248 0e:5628176599.999 string 0f:123.001.507.031  0f:123.456.789.111 0a:1234567890   0022c 0022d
string 0a:David Blah 2016-02-01 14:07:12.135 0e:3760973177.404   active 0f:123.456.789.111   0b:ABCD 34.5.1 0022a 0022b 06:active

there are spaces where the fields are missing. Let’s replace a single space with ^ for better visibility and understanding. The above data with space replaced by ^ looks like:

string^John Doe^2016-05-31^23:55:45.678^0e:9999999999.999^string^0f:123.456.789.99^06:active^0f:123.456.789.111^0a:1234567890^samsung^0a:AA 00.12.1^0022a^0022b^06:active
string^Blah Blah^2016-05-31^23:57:05.248^0e:5628176599.999^string 0f:123.001.507.031^^0f:123.456.789.111^0a:1234567890^^^0022c^0022d^
string^David Blah^2016-02-01^14:07:12.135^0e:3760973177.404^^^active^0f:123.456.789.111^^^0b:ABCD 34.5.1^0022a^0022b^06:active

There are fields which has space in it like the UserName field and DeviceVersion field. Here, the parser should be intelligent enough to read the count of number of characters in that field where ever possible (where the count is specified)and parse accordingly.

The absence of fields is totally unknown. I've used this input just as a sample. I've tried using awk by specifying column numbers (static). But I need something which reads dynamic content. I'm not sure how to read line by line.

Expected output:

string,John Doe,2016-05-31,23:55:45.678,9999999999.999,string,123.456.789.99,active,123.456.789.111,1234567890,samsung,AA 00.12.1,554,555,active
string,Blah Blah,2016-05-31,23:57:05.248,5628176599.999,string,123.001.507.031,,123.456.789.111,1234567890,,,556,557,
string,David Blah,2016-02-01,14:07:12.135,3760973177.404,active,123.456.789.111,,,ABCD 34.5.1,554,555,active

Edit

The columns dataUsed and dataSent are hex. They are to be converted to decimal.

intruder
  • 417
  • 1
  • 3
  • 18
  • @John1024 my bad. Forgot to mention. These hex values should be replaced by decimal equivalent – intruder May 31 '16 at 21:56
  • In the first line, the string `08:` is not converted to decimal, it is deleted. – John1024 May 31 '16 at 22:12
  • @John1024 yeah. Output should not contain any of the counts. The counts in input file specifies the length of that filed. As the input don't have any filed separator other than a space, this count should be considered while replacing the separating space with "," – intruder May 31 '16 at 22:16
  • There are some places in the output where the field separator is comma-space and some where it is comma without a space. – John1024 May 31 '16 at 22:22
  • @John1024 My bad. Sorry for that. I've posted it from phone..so didn't see it. Updated it now. Thanks! :) – intruder Jun 01 '16 at 00:49
  • This is basically the same as your question http://stackoverflow.com/questions/37449789/how-to-parse-contents-of-a-file-using-sed-awk – Michael Vehrs Jun 01 '16 at 06:27
  • @MichaelVehrs number of columns are dynamic here. – intruder Jun 01 '16 at 11:57

0 Answers0