formatting du -sh output

Question

I would like to customize the output from:

du -hs *

E.g output:

23G    Test1
1.2M   Folder With Spaces
12G    Another Folder With Spaces

The problem is that I can capture the first column but since the second column may contain spaces the output only captures the first word. Is there a way to capture the second column spaces included or perhaps return the remaining content for that line?

du -hs * | awk '{print $1 " " $2;}'

The above returns this:

23G Test1
1.2M Folder
12G Another

EDIT: The Solution is to add the -F and specify the tab delimiter:

du -hs * | awk -F'\t' '{print $1 " " $2;}'

Tabs are also valid characters in files/folders. In my case this would never be an issue.

It's a good idea to avoid creating files and directories with spaces in their names. On the other hand, it's an even better idea to be able to cope with existing files and directories with spaces (and other funny characters) in their names. — Keith Thompson, Jul 30 '15 at 21:18
I disagree. You should always check for special cases unless you are 100% certain that the files and folders you querying for do not contain spaces. — mac2017, Jul 31 '15 at 22:39
This was silly on my behalf I forgot to add the -F'\t' flag to tell awk to split on the tab delimiter. — mac2017, Jul 31 '15 at 22:42
Don't forget that tab is a valid character in file names, though it's not likely to occur. — Keith Thompson, Jul 31 '15 at 22:43
I was disagreeing with the first half of your comment and the second half of your comment attempts to take back what you said in the first half of your comment. Be more concise with what you are trying to state. — mac2017, Jul 31 '15 at 22:45
I realize that tabs could also be in folder names but in my case I don't care b/c I know the folders I am dealing with. — mac2017, Jul 31 '15 at 22:46
I believe my original statements were consistent. It's a good idea to be able to cope with spaces and other odd characters in file names (even newline, which means something like `du -0` is the only competely general solution). It's also a good idea IMHO to try to avoid creating files and directories with spaces in their names in the first place, just because the shell doesn't deal with them quite as well. — Keith Thompson, Jul 31 '15 at 23:51

score 3 · Answer 1 · answered Jul 30 '15 at 21:09

For my du (GNU coreutils), the size and file name are separated by a tab. So, the name can be retrieved by removing everything up to and including the first tab:

du -hs * | awk '{size=$1; name=$0; sub(/[^\t]*\t/, "", name); print name}'

NOTE: The above will fail if file names contain newline characters. Depending on what operating system you are using, there may be ways around this limitation. For example, on linux (GNU tools), du can produce NUL-separated records which GNU awk (gawk) can read and interpret:

du -0hs * | awk -v RS='\0'  '{size=$1; name=$0; sub(/[^\t]*\t/, "", name); print "NAME="name}'

styko · Answer 2 · 2015-07-31T04:35:43.823

2

As du uses tabs while your filenames shouldn't contain tab or new lines, you can simply use cut (with the default delimiter being tab).

du -hs * | cut -f1  # First field
du -hs * | cut -f2  # Second field
du -hs * | cut -f2-  # All fields >= 2 (if there are tabs in the filename)

Unless you need awk for further processing, this should be enough.

edited Jul 31 '15 at 04:35

answered Jul 30 '15 at 22:41

styko

641
3
14

`-f2-` would be slightly better, since it covers the unlikely case where a filename includes one or more tab characters. (Newlines in filenames will break it, of course.) – rici Jul 31 '15 at 03:22

formatting du -sh output

2 Answers2