Get total files size from a file containing a file list

Question

I have a file containing a list of files that I would like to know the total files size. Is there a command to do so?

My OS is a very basic linux (Qnap TS-410).

EDIT:

A few lines from the file:

/share/archive/Bailey Test/BD006/0.tga
/share/archive/Bailey/BD007/1 version 1.tga
/share/archive/Bailey 2/BD007/example.tga

That's some sort of NAS, right? Do you have busybox installed? — cjc, Jan 19 '12 at 17:45

Mattias Ahnberg · Accepted Answer · 2012-01-19T19:30:17.663

16

I believe something like this would work in busybox:

du `cat filelist.txt` | awk '{i+=$1} END {print i}'

I don't have the same environment as you, but if you encounter issues with spaces in filenames something like this would work too:

cat filelist.txt | while read file;do
  du "$file"
done | awk '{i+=$1} END {print i}'

Edit 1:
@stew is right in his post below, du shows the disk usage and not the exact filesize. To change the behavior busybox uses the -a flag, so try: du -a "$file" for exact filesize and compare the output/behavior.

edited Jan 19 '12 at 19:30

answered Jan 19 '12 at 18:56

Mattias Ahnberg

4,139
19
19

3

Thanks for your input, the first command returns `/usr/bin/du: Argument list too long` (almost 80,000 lines in my file). You second command just gives me a prompt once I hit enter, waiting for something more? – Nicolas Jan 20 '12 at 12:20
Hard to say with your environment. Is it the normal command prompt, or just a blinking prompt? If its the latter it might just be slow waiting for the result, if its an "input prompt" it might be that you missed some character? And if its a normal prompt I don't know, I tested it quite throughly before I typed it. :( – Mattias Ahnberg Jan 20 '12 at 14:17
it's an "input prompt" when I do the following `cat tgafiles.txt | while read file;do du "$file" done | awk '{i+=$1} END {print i}'`. thanks mattias – Nicolas Jan 20 '12 at 17:37
1

Ah! If you put everything on one line you need another; like this: `cat tgafiles.txt | while read file;do du "$file";done | awk '{i+=$1} END {print i}'` (i.e. before done). – Mattias Ahnberg Jan 20 '12 at 22:47
Spot on! It worked perfectly, cheers! (although I could have figured out this mistake by myself) – Nicolas Jan 23 '12 at 17:53

score 8 · Answer 2 · answered Sep 21 '15 at 12:52

8

du -c `cat filelist.txt` | tail -1 | cut -f 1

-c adds line "total size";
tail -1 takes last line (with total size);
cut -f 1 cuts out word "total".

answered Sep 21 '15 at 12:52

olegzhermal

181
1
2

1

This fails with du - argument list too long. My filelist is large. The below answer with xargs seems to be the easiest solution. – Syclone0044 Feb 26 '19 at 06:12

score 5 · Answer 3 · answered Jan 19 '12 at 18:06

5

I don't know if your linux tools are capable of this, but:

cat /tmp/filelist.txt  |xargs -d \\n du -c

Do, the xargs will set the delimiter to be a newline character, and du will produce a grand total for you.

Looking at http://busybox.net/downloads/BusyBox.html it seems that "busybox du" will support the grand total option, but the "busybox xargs" will not support custom delimiters.

Again, I'm not sure of your toolset.

answered Jan 19 '12 at 18:06

cjc

24,916
3
51
70

here's the result: `xargs: invalid option -- d` – Nicolas Jan 20 '12 at 12:21
Awesome: working with a NAS's busybox linux is like a McGuyver episode, trying to build a working airplane from some canvas, sticks and twine. – cjc Jan 20 '12 at 15:47
How about this, if you have the room for it on a different machine: copy all those files that you're interested in to some other, fully functional linux, and then run Stew's solution there. Doing that might be a lot easier than trying to figure out if busybox is capable of this sort of thing. – cjc Jan 20 '12 at 15:49
1

I think answer is the best. It's concise, and is much quicker than the other answers in this thread. – zymhan Nov 20 '14 at 18:54
Good answer. You may want to leave out `-c` since xargs will do multiple calls to `du` if the filelist is long enough, producing several `du` totals. – qwr Jul 18 '19 at 16:52

score 5 · Answer 4 · answered Jan 19 '12 at 19:22

5

while read filename ;  do stat -c '%s' $filename ; done < filelist.txt | awk '{total+=$1} END {print total}'

This is similar to Mattias Ahnberg's solution. Using "read" gets around problems with filenames/directories with spaces. I use stat instead of du to get the filesize. du is getting the amount of room it is using on disk instead of the filesize, which might be different. Depending on your filesystem, a 1 byte file will still occupy 4k on disk (or whatever the blocksize is). So for a 1 byte file, stat says 1 byte and du says 4k.

answered Jan 19 '12 at 19:22

stew

9,388
1
30
43

Good comment about filesize vs disksize! – Mattias Ahnberg Jan 19 '12 at 19:31
Very interesting comment indeed, unfortunately my linux does not know the `stat` command: `stat: command not found` – Nicolas Jan 20 '12 at 12:12
You might have to say "busybox stat". – cjc Jan 20 '12 at 15:53
it says `stat: applet not found` in this case – Nicolas Jan 20 '12 at 17:41

score 5 · Answer 5 · answered Dec 31 '12 at 06:21

5

Here's another solution to the problem:

cat filelist.txt | tr '\n' '\0' | wc -c --files0-from=-

answered Dec 31 '12 at 06:21

dsamarin

151
1
2

2

For me (on cygwin) `du -bc` runs a lot faster. – qwr Jul 26 '19 at 21:02

score 2 · Answer 6 · answered Feb 09 '23 at 10:44

2

What about this other option:

tr \\n \\0 < filelist.txt | du -ch --files0-from=-

You can tail -1 the output to have only the total and/or use du -b instead of du -h to get the output in bytes.

answered Feb 09 '23 at 10:44

Roberto

121
3

I really like this answer. If only the total is required it can just be piped into tail -1, viz: `tr \\n \\0 < filelist.txt | du -ch --files0-from=- | tail -1` – Scooby-2 Feb 10 '23 at 16:09

EEAA · Answer 7 · 2012-01-19T17:50:24.037

2

Try something like this:

$ cat filelist.txt | xargs ls -l | awk '{x+=$5} END {print "total bytes: " x}'

To deal properly with spaces in paths:

$ find /path/to/files -type f -print0 | xargs -0 ls -l | awk '{x+=$5} END {print "total bytes: " x}'

edited Jan 19 '12 at 17:50

answered Jan 19 '12 at 17:43

EEAA

109,363
18
175
245

thanks for your input, unfortunately I think there's an issue with the spaces in the directories within my file not being escaped with a "\"., therefore it breaks while going through the file list. – Nicolas Jan 19 '12 at 17:46
Can you bypass the text file list, and just generate this off of the ouput of `find`? – EEAA Jan 19 '12 at 17:48
unfortunately the list is too long, there are 79159 lines of files (full path), that's why I output it to a file; maybe I can add an argument about escaping the result of the find? – Nicolas Jan 19 '12 at 17:51
there's no "-print0" argument with the find on my linux system – Nicolas Jan 19 '12 at 17:54
@Nicolas - that's due to it using busybox's stripped-down `find` instead of the real `find` binary. – EEAA Jan 19 '12 at 17:55
very true, I've actaully already read something about this, I just need to use the full path of the find command, isn't it? (thanks for your help) – Nicolas Jan 19 '12 at 17:57

Scooby-2 · Answer 8 · 2023-02-07T20:15:46.797

This command checks the size of each file in the list and outputs the total in human readable format.

EDIT 07-02-2023: See the commands below. This (original) suggestion of mine parses the output from ls, which is prone to cause problems.

while read line; do ls -l "$line"; done < filelist.txt | awk '{total+=$5} END {print total}' | awk '{ split( "bytes KB MB GB TB PB" , v ); s=1; while( $1>1024 ){ $1/=1024; s++ } printf "Total: %.2f %s", $1, v[s] }'

It uses only ls, awk and printf, which are likely to exist on the most basic of systems. It avoids the need to use xargs as the read command takes its input one line at a time from the list of files.

It works as follows. It reads each line of the file containing the list (filelist.txt in my example here) and pipes the output into awk, which reads only field 5 ($5) and keeps adding to the total for each file. The ls command with the -l switch provides us with the file size in bytes, so this is fed into a second awk which is a function to repeatedly divide the bytes by 1024 to convert it to KB, MB, GB, TB and even PB if you have that much disk space! It will print whatever is appropriate, ie KB, GB etc to 2 decimal places. If you want 3 decimal places then change the 2 to a 3 in the printf command at the end, viz

printf "Total: %.3f %s", $1, v[s]...

Asked almost 11 years ago, last activity 6 years ago! Well, if anyone finds this, it does meet the OPs requirements and maybe it will help someone.

EDIT 07-02-2023:

As pointed out by @doneal24, we should not be parsing the output from ls. So we can simply read each line of the list of files, filelist.txt, and run du -b against each one. We then use awk to count the total bytes, and then again to convert it into human readable format. We do not need to pipe through xargs as the structure of the commands is that the filenames are presented singularly from the read command, which takes its input from the list one at a time.

while read line; do du -b "$line"; done < filelist.txt | awk '{i+=$1} END {print i}' | awk '{ split( "bytes KB MB GB TB PB" , v ); s=1; while( $1>1024 ){ $1/=1024; s++ } printf "Total: %.2f %s", $1, v[s] }'

I believe this is the answer the OP wanted - it provides the total size of the files in the list in human readable format.

Once again, thanks to @doneal24 for the input. It proves how, even answering a post over 10 years old, we continue to learn and help others. Teamwork!

Parsing the output of `ls` is a really bad idea. Multiple answers on Unix&Linux SE. Answers using `find` are much more robust. — doneal24, Jan 20 '23 at 21:50
@doneal24 Thanks for your comment. Here, I am simply using `ls` to extract each path and filename from a text file which was created using a `find` command. It would not be difficult to pipe the output of that command directly into the `awk` commands I have given via `xargs`, negating the need for `ls` but I personally prefer to have a list of the files which were processed (though of course this could also be done without using `ls`). Would you provide information which explains why the use of `ls` is less robust? Thanks in advance. — Scooby-2, Feb 07 '23 at 13:58
OK, I found this [link](http://mywiki.wooledge.org/ParsingLs) — Scooby-2, Feb 07 '23 at 14:20
Another good link from Unix&Linux SE is [here](https://unix.stackexchange.com/questions/128985/why-not-parse-ls-and-what-to-do-instead). Do you _know_ that your file names don't contain whitespace or new lines? How about globbing characters like `*`? Backslashes? Non-printable characters or unicode? Any of these will break parsing `ls`. — doneal24, Feb 07 '23 at 15:50
I have now amended my response to include a chain of commands which does not use `ls`. I have left my original suggested solution so that readers can see the mistake I made! — Scooby-2, Feb 07 '23 at 20:20

score 1 · Answer 9 · edited Mar 12 '16 at 08:15

1

cat docs.txt | xargs -d \\n du -sk | awk '{total+=$1} END{print total}'

edited Mar 12 '16 at 08:15

MadHatter

79,770
20
184
232

answered Mar 12 '16 at 07:32

Pradeep

11
1

Get total files size from a file containing a file list

9 Answers9