1

I use following awk script to do so,

for line in $1
do
 grep -F ".js" $1 | awk '{print $7}' | sort -u 
done 

the out put is almost there:

/blog/wp-includes/js/swfobject.js?ver=2.2
/fla/AC_RunActiveContent.js
/include/jquery.js
/include/jquery.jshowoff2.js
/include/jquery.jshowoff.min.js
/include/js/jquery.lightbox-0.5.js
/scripts/ac_runactivecontent.js

I tried piping: cut -d "/" -f5 intead of awk, but parts of script name are cut off as well.

ac_runactivecontent.js HTTP
AC_RunActiveContent.js HTTP
jquery.jshowoff2.js HTTP
jquery.jshowoff.min.js HTTP
jquery.js HTTP
js
wp-includes

How would I go about extracting from the pattern .js to the delimiter "/" so that I only get the script file name:

swfobject.js
AC_RunActiveContent.js
jquery.js
jquery.jshowoff2.js
jquery.jshowoff.min.js
jquery.lightbox-0.5.js
ac_runactivecontent.js

4 Answers4

1

Probably going to be more efficient to look at replacing the current for/grep/awk/sort with a single awk (and optional sort).

Setup:

$ cat filename.js
1 2 3 4 5 6 /blog/wp-includes/js/swfobject.js?ver=2.2 8 9 10
ignore this line
1 2 3 4 5 6 /fla/AC_RunActiveContent.js 8 9 10
1 2 3 4 5 6 /include/jquery.js 8 9 10
ignore this line
1 2 3 4 5 6 /include/jquery.jshowoff2.js 8 9 10
1 2 3 4 5 6 /include/jquery.jshowoff.min.js 8 9 10
ignore this line
1 2 3 4 5 6 /include/js/jquery.lightbox-0.5.js 8 9 10
1 2 3 4 5 6 /scripts/ac_runactivecontent.js 8 9 10

One awk idea:

awk '
/.js/ { n=split($7,a,"[/?]")          # split field #7 on dual characters "/" and "?", putting substrings into array a[]
        for (i=n;i>=1;i--)            # assuming desired string is toward end of $7 we will work backward through the array
        if (a[i] ~ ".js") {           # if we find a match then ...
           print a[i]                 # print it and break out of the loop ...
           next                       # by going to next input record
        }
      }
' filename.js

# or as a single line:

awk '/.js/ {n=split($7,a,"[/?]"); for (i=n;i>=1;i--) if (a[i] ~ ".js") { print a[i]; next}}' filename.js

This generates:

swfobject.js
AC_RunActiveContent.js
jquery.js
jquery.jshowoff2.js
jquery.jshowoff.min.js
jquery.lightbox-0.5.js
ac_runactivecontent.js

NOTE: OP can pipe the results to sort if desired

markp-fuso
  • 28,790
  • 4
  • 16
  • 36
0

Since you are already using awk, the answer provided by @markp-fuso is probably your best option. If you are open to other options, you may be able to use a combination of grep and basename. (Note that this will likely be less efficient due to piping grep output to basename)

Using the sample file from the answer provided by @markp-fuso, the following:

grep -o ' /.*\.js' tt.dat | xargs basename

Produces the following output:

swfobject.js
AC_RunActiveContent.js
jquery.js
jquery.jshowoff2.js
jquery.jshowoff.min.js
jquery.lightbox-0.5.js
ac_runactivecontent.js
j_b
  • 1,975
  • 3
  • 8
  • 14
0

Using awk you could print the match for the filename from the 7th column.

The pattern [^/]+\.js matches 1+ times any character except / followed by matching .js

Using for example a file as input:

awk '
match($7, /[^/]+\.js/) {
  print substr($7, RSTART, RLENGTH)
}
' file

Output

swfobject.js
AC_RunActiveContent.js
jquery.js
jquery.jshowoff2.js
jquery.jshowoff.min.js
jquery.lightbox-0.5.js
ac_runactivecontent.js
The fourth bird
  • 154,723
  • 16
  • 55
  • 70
-1

Try

basename

and

man basename

command.

Petr Matousu
  • 3,120
  • 1
  • 20
  • 32