0

Have a capybara script that among other things downloads absolute image links.

When trying to write those images to disk I receive an error:

File name too long

The output also includes a long list of all the image URLs in the array. I think a gsub would solve this but I'm not sure which one or exactly how to implement it.

Here are a few sample image URLs that are part of the link array. A suitable substitute name would be g0377p-xl-3-24c1.jpg or g0371b-m-4-6896.jpg in these examples:

http://www.example.com/media/catalog/product/cache/1/image/560x560/ced77cb19565515451b3578a3bc0ea5e/g/0/g0377p-xl-3-24c1.jpg
http://www.example.com/media/catalog/product/cache/1/image/560x560/ced77cb19565515451b3578a3bc0ea5e/g/0/g0371b-m-4-6896.jpg

This is the code:

require "capybara/dsl"
require "spreadsheet"
require 'fileutils'
require 'open-uri'

   def initialize
     @excel = Spreadsheet::Workbook.new
     @work_list = @excel.create_worksheet
     @row = 0
   end

       imagelink = info.all("//*[@rel='lightbox[rotation]']")
       @work_list[@row, 6] = imagelink.map { |link| link['href'] }.join(', ')
       image = imagelink.map { |link| link['href'] }
       File.basename("#{image}", "w") do |f|
         f.write(open(image).read)
       end
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
jcuwaz
  • 187
  • 3
  • 14
  • Do you care what the image name actually is? If not, just make up your own name. If it’s too long you just have to make it shorter. That simple. Unless you tell us what a suitable substitute name is we can’t help—and you would already know the answer. – Andrew Marshall Dec 07 '13 at 02:47
  • A suitable substitute name would be: g0377p-xl-3-24c1.jpg or g0371b-m-4-6896.jpg in the examples above; so basically everything after the final forward slash. I guess my question how and where to use the gsub to shorten these on the fly before they are saved. – jcuwaz Dec 07 '13 at 03:20

2 Answers2

1

You can use File.basename to get just the filename:

uri = 'http://www.example.com/media/catalog/product/cache/1/image/560x560/ced77cb19565515451b3578a3bc0ea5e/g/0/g0377p-xl-3-24c1.jpg'
File.basename uri  #=> "g0377p-xl-3-24c1.jpg"
Andrew Marshall
  • 95,083
  • 20
  • 220
  • 214
  • File.basename.open creates wrong number of arguments (0 for 1..2) (ArgumentError) ; here is a test url in question: http://www.tomtop.com/clothing-accessories/fashion-ol-women-chiffon-shirt-pleated-front-long-sleeve-button-blouse-tops-g0377.html ; I'm trying to visit and grab all the absolute image urls on that page defined by the selector I have chosen using capybara. – jcuwaz Dec 07 '13 at 12:37
  • Ultimately I want to download the images, shorten their name to the base and reference the shortened name in my spreadsheet document. – jcuwaz Dec 07 '13 at 12:47
  • 1
    @jcuwaz Why are you calling `open`? Nowhere in my answer do I do that. This is just to get the filename, not actually write it. – Andrew Marshall Dec 07 '13 at 17:28
  • Ok that works well; How can I reference that shortened basename in the spreadsheet element that I'm creating in this step: @work_list[@row, 6] = imagelink.map { |link| link['href'] }.join(', '). – jcuwaz Dec 07 '13 at 17:35
0

There is a real problem with the creation of filename.

imagelink = info.all("//*[@rel='lightbox[rotation]']")

Will return an array of nodes.

From that you get the href value using map and save the resulting array in image.

Then you try to use that array as the name of the file.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303