1

I want to parse some urls's which have the following format :-

var url ="http://www.example.com/cooks/cooking-dress-wine/~no-order/pr?p%5B%5D=sort%3Dfeatured&sid=bks%2C43p&mycracker=ch_vn_clothing_subcategory_Puma&ref=b41c8097-8efe-4acf-8919-0fa81bcb590a" 

Its not necessary that the domain name and other parts would be same for all url's, they can vary i.e I am looking at a general solution.

Basically I want to strip off all the other things and get only the part:

/cooks/cooking-dress-wine/~no-order/pr?p%5B%5D=sort%3Dfeatured&sid=bks%2C43p

I thought to parse this using JavaScript and Regular Expression

I am doing like this:

var mapObj = {"/^(http:\/\/)?.*?\//":"","(&mycracker.+)":"","(&ref.+)":""};
var re = new RegExp(Object.keys(mapObj).join("|"),"gi");
url = url.replace(re, function(matched){
  return mapObj[matched];
}); 

But its returning this

http://www.example.com/cooks/cooking-dress-wine/~no-order/pr?p%5B%5D=sort%3Dfeatured&sid=bks%2C43pundefined

Where am I not doing the correct thing? Or is there another approach with an even easier solution?

Sujith PS
  • 4,776
  • 3
  • 34
  • 61
John Doe
  • 2,752
  • 5
  • 40
  • 58

4 Answers4

2

You can use :

/(?:https?:\/\/[^\/]*)(\/.*?)(?=\&mycracker)/

Code :

var s="http://www.example.com/cooks/cooking-dress-wine/~no-order/pr?p%5B%5D=sort%3Dfeatured&sid=bks%2C43p&mycracker=ch_vn_clothing_subcategory_Puma&ref=b41c8097-8efe-4acf-8919-0fa81bcb590a";
var ss=/(?:https?:\/\/[^\/]*)(\/.*?)(?=\&mycracker)/;
console.log(s.match(ss)[1]);

Demo

Fiddle Demo

Explanation :

explanation

Sujith PS
  • 4,776
  • 3
  • 34
  • 61
1

Why don't you just map a split array?

You don't quite need to regex the URL, but you will have to run an if statement inside the loop to remove specific GET params from them. In this particular case (key word particular) you just have to substring till the indexOf "&mycracker"

var url ="http://www.example.com/cooks/cooking-dress-wine/~no-order/pr?p%5B%5D=sort%3Dfeatured&sid=bks%2C43p&mycracker=ch_vn_clothing_subcategory_Puma&ref=b41c8097-8efe-4acf-8919-0fa81bcb590a" 
var x = url.split("/");
var y = [];
x.map(function(data,index) { if (index >= 3) y.push(data); });
var path = "/"+y.join("/");
path = path.substring(0,path.indexOf("&mycracker"));
Sterling Archer
  • 22,070
  • 18
  • 81
  • 118
  • But still I want to get rid of &mycracker=ch_vn_clothing_subcategory_Puma&ref=b41c8097-8efe-4acf-8919-0fa81bcb5‌​90a and I the url above is just an indicative url, same script I would Like to use in all scenarios – John Doe Jan 24 '14 at 05:20
  • Updated to truncate the URL at the index of "&mycracker" and the URL matched your desired URL. – Sterling Archer Jan 24 '14 at 05:21
1

Change the following code a little bit and you can retrieve any parameter:

var url = "http://www.example.com/cooks/cooking-dress-wine/~no-order/pr?p%5B%5D=sort%3Dfeatured&sid=bks%2C43p&mycracker=ch_vn_clothing_subcategory_Puma&ref=b41c8097-8efe-4acf-8919-0fa81bcb590a"
var re = new RegExp(/http:\/\/[^?]+/);
var part1 = url.match(re);
var remain = url.replace(re, '');
//alert('Part1: ' + part1);
var rf = remain.split('&');
// alert('Part2: ' + rf);
var part2 = '';
for (var i = 0; i < rf.length; i++) 
    if (rf[i].match(/(p%5B%5D|sid)=/))
        part2 += rf[i] + '&';
part2 = part2.replace(/&$/, '');
//alert(part2)
url = part1 + part2;
alert(url);
Johnny
  • 481
  • 4
  • 13
0
var url ="http://www.example.com/cooks/cooking-dress-wine/~no-order/pr?p%5B%5D=sort%3Dfeatured&sid=bks%2C43p&mycracker=ch_vn_clothing_subcategory_Puma&ref=b41c8097-8efe-4acf-8919-0fa81bcb590a";
var newAddr = url.substr(22,url.length);
// newAddr == "/cooks/cooking-dress-wine/~no-order/pr?p%5B%5D=sort%3Dfeatured&sid=bks%2C43p&mycracker=ch_vn_clothing_subcategory_Puma&ref=b41c8097-8efe-4acf-8919-0fa81bcb590a"

22 is where to start slicing up the string.

url.length is how much of it to include.

This works as long as the domain name remains the same on the links.

Deryck
  • 7,608
  • 2
  • 24
  • 43
  • But still I want to get rid of &mycracker=ch_vn_clothing_subcategory_Puma&ref=b41c8097-8efe-4acf-8919-0fa81bcb590a and I the url above is just an indicative url, same script I would Like to use in all scenarios. – John Doe Jan 24 '14 at 05:13