sed / awk - remove space in file name

Question

I'm trying to remove whitespace in file names and replace them.

Input:

echo "File Name1.xml File Name3 report.xml" | sed 's/[[:space:]]/__/g'

However the output

File__Name1.xml__File__Name3__report.xml

Desired output

File__Name1.xml File__Name3__report.xml

Where are the filenames coming from? Awk can be instructed to delimit on newlines, which can then more easily be matches in a pattern. — linden2015, Aug 19 '17 at 18:37
When you have filenames with spaces in them, making a space-delimited list of them is inherently ambiguous. You'd be much better not putting them in space-delemited format in the first place, rather than trying to fix the problem after they're in space-delimited format. Depending on the situation, there's almost certainly a better way to do this. See [BashFAQ #20](http://mywiki.wooledge.org/BashFAQ/020) for some better ideas. — Gordon Davisson, Aug 19 '17 at 19:22
Note that nothing forbids to have this kind of filename: `file.xml .xml .xml` — Casimir et Hippolyte, Aug 19 '17 at 19:36
It's extremely likely that this is an XY Problem and you don't actually need to remove white space at all, you just aren't quoting your variables properly or have other fundamental errors. — Ed Morton, Aug 20 '17 at 00:33

gboffi · Accepted Answer · 2017-08-20T13:16:10.483

You named awk in the title of the question, didn't you?

$ echo "File Name1.xml File Name3 report.xml" | \
> awk -F'.xml *' '{for(i=1;i<=NF;i++){gsub(" ","_",$i); printf i<NF?$i ".xml ":"\n" }}'
File_Name1.xml File_Name3_report.xml
$

-F'.xml *' instructs awk to split on a regex, the requested extension plus 0 or more spaces
the loop {for(i=1;i<=NF;i++) is executed for all the fields in which the input line(s) is(are) splitted — note that the last field is void (it is what follows the last extension), but we are going to take that into account...
the body of the loop
- gsub(" ","_", $i) substitutes all the occurrences of space to underscores in the current field, as indexed by the loop variable i
- printf i<NF?$i ".xml ":"\n" output different things, if i<NF it's a regular field, so we append the extension and a space, otherwise i equals NF, we just want to terminate the output line with a newline.

It's not perfect, it appends a space after the last filename. I hope that's good enough...

▶ A D D E N D U M ◀

I'd like to address:

the little buglet of the last space...
some of the issues reported by Ed Morton
generalize the extension provided to awk

To reach these goals, I've decided to wrap the scriptlet in a shell function, that changing spaces into underscores is named s2u

$ s2u () { awk -F'\.'$1' *' -v ext=".$1" '{
> NF--;for(i=1;i<=NF;i++){gsub(" ","_",$i);printf "%s",$i ext (i<NF?" ":"\n")}}'
> }
$ echo "File Name1.xml File Name3 report.xml" | s2u xml
File_Name1.xml File_Name3_report.xml
$

It's a bit different (better?) 'cs it does not special print the last field but instead special-cases the delimiter appended to each field, but the idea of splitting on the extension remains.

That will cause a syntax error in some awks due to the unparenthesized ternary expression and it will fail cryptically when a file name contains printf formatting characters, e.g. `big%slip.xml` - always use `printf "%s", $i` rather than `printf $i`. Rather than hard-coding the value you hope/assume the ORS will be, just literally use `ORS` instead of `"\n"` at the end of the printf.You might want to use `[[:space:]]+` instead of `" "` too in the -F and gsub regexps. — Ed Morton, Aug 20 '17 at 00:35
Some `awk`s have no regexp character classes... and I don't want to use the default `ORS`, I want a newline! That said, I've implemented your suggestion about parenthesis in ternary expressions and the correct use of `printf`, also corrected the last blank buglet while at it. tx — gboffi, Aug 20 '17 at 13:17

score 0 · Answer 2 · answered Aug 19 '17 at 18:43

This seems a good start if the filenames aren't delineated:

((?:\S.*?)?\.\w{1,})\b

(        // start of captured group
(?:      // non-captured group
\S.*?    // a non-white-space character, then 0 or more any character
)?       // 0 or 1 times
\.       // a dot
\w{1,}   // 1 or more word characters
)        // end of captured group
\b       // a word boundary

You'll have to look-up how a PCRE pattern converts to a shell pattern. Alternatively it can be run from a Python/Perl/PHP script.

Demo

Ulysse BN · Answer 3 · 2017-08-19T21:45:26.147

0

You could use rename:

rename --nows *.xml

This will replace all the spaces of the xml files in the current folder with _.

Sometimes it comes without the --nows option, so you can then use a search and replace:

rename 's/[[:space:]]/__/g' *.xml

Eventually you can use --dry-run if you want to just print filenames without editing the names.

edited Aug 19 '17 at 21:45

answered Aug 19 '17 at 18:46

Ulysse BN

10,116
7
54
82

Please read the question. The OP has not asked to rename the files. Maybe they'll change the question, but for now you answered another question. – gboffi Aug 19 '17 at 18:53
1

I don't know what op want to do in the end (and I gave `--dry-run` for that purpose). But from `remove space in file name` I clearly assume my answer even if not the accepted one, is still on topic... – Ulysse BN Aug 19 '17 at 18:55
The OP question concerns a string containing filenames, how the `--dry-run` option could help them? In my opinion, your answer should be reformulated as a comment, _"Won't you by any chance be after renaming those fi les?"_ – gboffi Aug 19 '17 at 19:12
2

The op is asking how to rename files, his example shows that clearly. The echo I believe was just a way to demonstrate the space file names, if that's not the case, then it's a different question than he appears to be asking. – Lizardx Aug 19 '17 at 21:16
@gboffi yes `dry-run` option handle the case where OP would indeed not want to rename files but print filenames. – Ulysse BN Aug 21 '17 at 15:06

Lizardx · Answer 4 · 2017-08-19T21:29:04.170

Assuming you are asking how to rename file names, and not remove spaces in a list of file names that are being used for some other reason, this is the long and short way. The long way uses sed. The short way uses rename. If you are not trying to rename files, your question is quite unclear and should be revised.

If the goal is to simply get a list of xml file names and change them with sed, the bottom example is how to do that.

directory contents:

ls -w 2
bob is over there.xml
fred is here.xml
greg is there.xml

cd [directory with files]
shopt -s nullglob
a_glob=(*.xml);
for ((i=0;i< ${#a_glob[@]}; i++));do 
   echo "${a_glob[i]}";
done
shopt -u nullglob
# output
bob is over there.xml
fred is here.xml
greg is there.xml

# then rename them
cd [directory with files]
shopt -s nullglob
a_glob=(*.xml);
for ((i=0;i< ${#a_glob[@]}; i++));do 
   # I prefer 'rename' for such things
   # rename 's/[[:space:]]/_/g' "${a_glob[i]}";
   # but sed works, can't see any reason to use it for this purpose though
   mv "${a_glob[i]}" $(sed 's/[[:space:]]/_/g' <<< "${a_glob[i]}");
done
shopt -u nullglob

result:

ls -w 2
bob_is_over_there.xml
fred_is_here.xml
greg_is_there.xml

globbing is what you want here because of the spaces in the names.

However, this is really a complicated solution, when actually all you need to do is:

cd [your space containing directory]
rename 's/[[:space:]]/_/g' *.xml

and that's it, you're done.

If on the other hand you are trying to create a list of file names, you'd certainly want the globbing method, which if you just modify the statement, will do what you want there too, that is, just use sed to change the output file name.

If your goal is to change the filenames for output purposes, and not rename the actual files:

cd [directory with files]
shopt -s nullglob
a_glob=(*.xml);
for ((i=0;i< ${#a_glob[@]}; i++));do 
   echo "${a_glob[i]}" | sed 's/[[:space:]]/_/g';
done
shopt -u nullglob
# output:
bob_is_over_there.xml
fred_is_here.xml
greg_is_there.xml

sed / awk - remove space in file name

4 Answers4