2

I have a large log file from which I need to extract file names.

The file looks like this:

/path/to/loremIpsumDolor.sit /more/text/here/notAlways/theSame/here
/path/to/anotherFile.ext /more/text/here/differentText/here
.... about 10 million times

I need to extract the file names like this:

loremIpsumDolor.sit
anotherFile.ext

I figure my first strategy is to find/replace all /path/to/ with ''. But I'm stuck how to remove all characters after the space.

Can you help?

Ryan
  • 14,682
  • 32
  • 106
  • 179

4 Answers4

6
sed 's/ .*//' file

It doesn't take any more. The transformed output appears on standard output, of course.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • umm... regex for stripping after the first space? Wouldn't expect that from you ;-) – Michael Krelin - hacker Nov 15 '12 at 19:44
  • Brute force `sed` action; I like it. It is a shame that Windows does not provide such powerful text manipulation tools like sed, grep, awk, etc. by default. These are the bread-n-butter tools for a sys admin (IMHO). – Will Nov 15 '12 at 19:51
  • 1
    I dislike 'cut' because the standard ([POSIX](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/cut.html)) versions of it don't handle one-or-more separators between fields; GNU `cut` has the necessary `-i` option, but I can't always rely on having GNU `cut` available. Granted, not an issue with this particular task, but if you don't use a tool because it doesn't always work, you don't use it. I find `sed` easier to use, but there are multiple tools for the job (`awk`, `perl`, `python` could all be used very easily, but they're more complex than necessary. – Jonathan Leffler Nov 15 '12 at 19:52
  • @JonathanLeffler, I find `sed` more complex *for this particular task*. That's why I didn't expect that from you. (and no tool *always work*). That said, expected or not, I do not find anything severely wrong with this solution ;-) – Michael Krelin - hacker Nov 15 '12 at 20:38
2

Pass it to cut:

cut '-d ' -f1 yourfile
Michael Krelin - hacker
  • 138,757
  • 24
  • 193
  • 173
2

In theory, you could also use awk to grab the filename from each line as:

awk '{ print $1 }' input_file.log

That, of course, assumes that there are no spaces in any of the filenames. awk defaults to looking for whitespace as the field delimiters, so the above snippet would take the first "field" from your log file (your filename) for each line, and output it.

Will
  • 3,500
  • 4
  • 30
  • 38
  • Ah, but there are actually spaces before in my real log file. But I like this direction. In reality it's more like `textHere thenSpaces /path/to/file.ext /more/text/here`. I didn't mention it because I figured I'd have to sed find/replace the first part anyway (since it's always the same). – Ryan Nov 15 '12 at 19:57
  • @Ryan: no sweat; you would just use `print $2` instead, since it would then be the second field. `awk` is a handy tool for things just such as this, and it is worth getting reasonably good at using it. – Will Nov 15 '12 at 19:59
0

a bash-only solution:

while read path otherstuff; do
    echo ${path##*/}
done < filename
glenn jackman
  • 238,783
  • 38
  • 220
  • 352