Why is my text file displaying all of my fields as being equal to only one field?

Question

I am using bash to file-carve a text file with (theoretically) four fields: MD5, Timestamp, Hostname and Filepath. Each of these fields sits right above one another as shown in the output below. When I perform the following command I have output that returns 1 for every field.

awk '{print NF, "- " $1}' best_file.txt

Output:

1 - md5:XXXXXXXX
1 - timestamp:XXXXXXXX
1 - endpoint:XXXXXXXX
1 - filename:XXXXXXXX
1 - md5:XXXXXXXX
1 - timestamp:XXXXXXXX
1 - endpoint:XXXXXXXX
1 - filename:XXXXXXXX

I am trying to carve my file and organize it however I choose with those four fields. For example, when using cut or 'awk' I cannot specify which field to cut or awk because it all appears as 1 field.

I would like to have the option to present MD5's and hostnames right next to each other, or filename's and timestamps side-by-side. Any help to understand why all of my fields are being presented as 1 field would be appreciated. Once again, I would expect 4, but its all showing up as one...

What do you mean when you say the file _theoretically_ has 4 fields? Does it or doesn't it? In order for anyone to answer your question, you will need to include a sample of the input file, best_file.txt. — EJK, Dec 31 '18 at 23:35
You say these fields sit above each other. Fields are counted within the same row of the file. — Barmar, Jan 01 '19 at 01:03

score 1 · Answer 1 · answered Jan 01 '19 at 01:36

I produced analogous output with best_file.txt containing

md5:XXXX
timestamp:XXXX
endpoint:XXXX
filename:XXXX

It is unclear whether those key names are actually in your source. For files of this sort, I do not recommend it.

Documentation on print NF noted that the fields are space-delimited. Change the first line of that to md5 XXXX and the corresponding output is:

2 -  md5

where print NF gives the number of fields, $1 is the first field and $2 is the second (outputs XXXX in this case). So your file only has one space-delimited field per line, which is a problem when awk runs these commands once per line (record).

If it was 'md5:XXXX timestamp:XXXX endpoint:XXXX filename:XXXX', then I could run awk '{print NF ": " $1 " " $3}' best_file.txt to get

4: md5:XXXX endpoint:XXXX

Of course, that may not be under your control. You could:

1. Combine groups of lines into one

You can get an arbitrary number of fields on one line with awk '!(NR%4){print p " " $0, p=""}(NR%4){p=p " " $0}' best_file.txt (requires some adjustment to get rid of the leading space; Joining two consecutive lines using awk or sed explains the commands). Then you can have a more useful input file.

2. Hunt for the lines you want

Adding more precise conditions to the second block of that snippet, you can choose which lines (fields) to remember for printing.

Both approaches fail if the source file is missing a line somewhere. Actually parsing the keys will require a good deal more logic.

Why is my text file displaying all of my fields as being equal to only one field?

1 Answers1