-1

I've searched across multiple sources for both Grep and RegEx selectors to select all images in a massive collection of garbled code and text. The closest I've come is How to Use grep to find '../images/', which didn't work for me.

I need to select the first occurrence of all image names (or copy all image names to a separate file) in my source file, so that, for example:

/Volumes/Data Drive/joomla-2-wp/wp-content/uploads/2003/12/someurl.com_images_ABanner.gif

would select only

someurl.com_images_ABanner.gif

Here's a sample of the text that I am attempting to search through:

[fg-joomla-to-wordpress] Can't copy http://someurl.com/images/banners/ABanner.gif to /Volumes/Data Drive/joomla-2-wp/wp-content/uploads/2003/12/someurl.com_images_banners_ABanner.gif : Not Found
[fg-joomla-to-wordpress] Can't copy http://someurl.com/images/randy.jpg to /Volumes/Data Drive/joomla-2-wp/wp-content/uploads/2003/12/someurl.com_images_randy.jpg : Not Found
[fg-joomla-to-wordpress] Can't copy http://www.differenturl.com/images-body0/logo2.gif to /Volumes/Data Drive/joomla-2-wp/wp-content/uploads/2003/12/www.differenturl.com_images-body0_logo2.gif : Not Found
[fg-joomla-to-wordpress] Can't copy /images/DiffImage.jpg to /Volumes/Data Drive/joomla-2-wp/wp-content/uploads/2003/12/images_DiffImage.jpg : A valid URL was not provided.
[fg-joomla-to-wordpress] Can't copy /images/DSCN0248.jpg to /Volumes/Data Drive/joomla-2-wp/wp-content/uploads/2003/12/images_DSCN0248.jpg : A valid URL was not provided.

I recognize the pattern of the first occurrence contains /images/ with some exceptions (for example /images-body0/imagename.jpg), while the target does not, which simplifies it, but I just can't get it.

Community
  • 1
  • 1
KillerDesigner
  • 485
  • 1
  • 6
  • 18
  • Please show your desired output for that sample text. – John1024 Dec 03 '15 at 21:29
  • Hey John. I did. Out of all text, I want to select only the image name, indicated above by "someurl.com_images_ABanner.gif" – KillerDesigner Dec 04 '15 at 18:28
  • Your sample input has two image names per line. In the text, you say that you "need to select the first occurrence." Is that correct? Or, are you looking for the second image file name? – John1024 Dec 04 '15 at 19:30
  • I believe the first occurrence (the source) is the same as the second occurrence (the target), which is why I think we only need the first occurrence. The sample code indicates (for the most part) that the source can't be copied to the target. I believe, but am not 100% positive (it's a very large file) that every line is an error indicating that the source can't be copied to the target. Does that help? – KillerDesigner Dec 04 '15 at 20:16
  • The first and the second _do differ_. Take line 1 for example: the first is `ABanner.gif` and the second is `omeurl.com_images_banners_ABanner.gif`. – John1024 Dec 04 '15 at 20:25
  • Ooohh.. good observation John. Hmmm... have to re-evaluate solution results. Thanks – KillerDesigner Dec 05 '15 at 01:39

3 Answers3

0

Using awk

If I understand correctly, what you are looking for in your sample text is the last path element of the fourth field. In that case:

$ awk '{n=split($4,a,"/"); print a[n]}' file
ABanner.gif
randy.jpg
logo2.gif
DiffImage.jpg
DSCN0248.jpg

Using sed

To obtain the last element of the file name that exists between copy and to:

$ sed -E 's|.* copy .*/(.*) to .*|\1|' file
ABanner.gif
randy.jpg
logo2.gif
DiffImage.jpg
DSCN0248.jpg
John1024
  • 109,961
  • 14
  • 137
  • 171
0

How's this, with sed's extended (-E) regular expressions? I'm selecting for all images (jpg, gif, png) occurring before the : at the end of the line in your input.

$ sed -nE 's,^.*/([^/]*(jpg|gif|png)) : .*$,\1,p' file
someurl.com_images_banners_ABanner.gif
someurl.com_images_randy.jpg
www.differenturl.com_images-body0_logo2.gif
images_DiffImage.jpg
images_DSCN0248.jpg
Ewan Mellor
  • 6,747
  • 1
  • 24
  • 39
0

If all the line in your files have the same pattern than in your sample, you can simply extract the 7th field of each lines like this :

$ cat file
[fg-joomla-to-wordpress] Can't copy http://someurl.com/images/banners/ABanner.gif to /Volumes/Data Drive/joomla-2-wp/wp-content/uploads/2003/12/someurl.com_images_banners_ABanner.gif : Not Found
[fg-joomla-to-wordpress] Can't copy http://someurl.com/images/randy.jpg to /Volumes/Data Drive/joomla-2-wp/wp-content/uploads/2003/12/someurl.com_images_randy.jpg : Not Found
[fg-joomla-to-wordpress] Can't copy http://www.differenturl.com/images-body0/logo2.gif to /Volumes/Data Drive/joomla-2-wp/wp-content/uploads/2003/12/www.differenturl.com_images-body0_logo2.gif : Not Found
[fg-joomla-to-wordpress] Can't copy /images/DiffImage.jpg to /Volumes/Data Drive/joomla-2-wp/wp-content/uploads/2003/12/images_DiffImage.jpg : A valid URL was not provided.
[fg-joomla-to-wordpress] Can't copy /images/DSCN0248.jpg to /Volumes/Data Drive/joomla-2-wp/wp-content/uploads/2003/12/images_DSCN0248.jpg : A valid URL was not provided.

$ cut -d' ' -f7 file | sed '/images/ s#.*/\([^/]*\)#\1#'
someurl.com_images_banners_ABanner.gif
someurl.com_images_randy.jpg
www.differenturl.com_images-body0_logo2.gif
images_DiffImage.jpg
images_DSCN0248.jpg
Alfwed
  • 3,307
  • 2
  • 18
  • 20