How to parse a url using Javascript and Regular Expression?

Question

I want to parse some urls's which have the following format :-

var url ="http://www.example.com/cooks/cooking-dress-wine/~no-order/pr?p%5B%5D=sort%3Dfeatured&sid=bks%2C43p&mycracker=ch_vn_clothing_subcategory_Puma&ref=b41c8097-8efe-4acf-8919-0fa81bcb590a"

Its not necessary that the domain name and other parts would be same for all url's, they can vary i.e I am looking at a general solution.

Basically I want to strip off all the other things and get only the part:

/cooks/cooking-dress-wine/~no-order/pr?p%5B%5D=sort%3Dfeatured&sid=bks%2C43p

I thought to parse this using JavaScript and Regular Expression

I am doing like this:

var mapObj = {"/^(http:\/\/)?.*?\//":"","(&mycracker.+)":"","(&ref.+)":""};
var re = new RegExp(Object.keys(mapObj).join("|"),"gi");
url = url.replace(re, function(matched){
  return mapObj[matched];
});

But its returning this

http://www.example.com/cooks/cooking-dress-wine/~no-order/pr?p%5B%5D=sort%3Dfeatured&sid=bks%2C43pundefined

Where am I not doing the correct thing? Or is there another approach with an even easier solution?

Sujith PS · Accepted Answer · 2014-01-24T06:18:13.450

2

You can use :

/(?:https?:\/\/[^\/]*)(\/.*?)(?=\&mycracker)/

Code :

var s="http://www.example.com/cooks/cooking-dress-wine/~no-order/pr?p%5B%5D=sort%3Dfeatured&sid=bks%2C43p&mycracker=ch_vn_clothing_subcategory_Puma&ref=b41c8097-8efe-4acf-8919-0fa81bcb590a";
var ss=/(?:https?:\/\/[^\/]*)(\/.*?)(?=\&mycracker)/;
console.log(s.match(ss)[1]);

Demo

Fiddle Demo

Explanation :

explanation

edited Jan 24 '14 at 06:18

answered Jan 24 '14 at 05:15

Sujith PS

4,776
3
34
61

What would be the solution if the word cook is not there ? I mean I constructed this url to just to give glimpse of the problem I am facing. – John Doe Jan 24 '14 at 05:25
Yes this seems to be nice. – John Doe Jan 24 '14 at 05:45
Yeah Jason, I also liked it. – John Doe Jan 24 '14 at 06:26

Sterling Archer · Answer 2 · 2014-01-24T05:21:32.447

1

Why don't you just map a split array?

You don't quite need to regex the URL, but you will have to run an if statement inside the loop to remove specific GET params from them. In this particular case (key word particular) you just have to substring till the indexOf "&mycracker"

var url ="http://www.example.com/cooks/cooking-dress-wine/~no-order/pr?p%5B%5D=sort%3Dfeatured&sid=bks%2C43p&mycracker=ch_vn_clothing_subcategory_Puma&ref=b41c8097-8efe-4acf-8919-0fa81bcb590a" 
var x = url.split("/");
var y = [];
x.map(function(data,index) { if (index >= 3) y.push(data); });
var path = "/"+y.join("/");
path = path.substring(0,path.indexOf("&mycracker"));

edited Jan 24 '14 at 05:21

answered Jan 24 '14 at 05:14

Sterling Archer

22,070
18
81
118

But still I want to get rid of &mycracker=ch_vn_clothing_subcategory_Puma&ref=b41c8097-8efe-4acf-8919-0fa81bcb5‌90a and I the url above is just an indicative url, same script I would Like to use in all scenarios – John Doe Jan 24 '14 at 05:20
Updated to truncate the URL at the index of "&mycracker" and the URL matched your desired URL. – Sterling Archer Jan 24 '14 at 05:21

Johnny · Answer 3 · 2014-01-24T06:13:44.797

Change the following code a little bit and you can retrieve any parameter:

var url = "http://www.example.com/cooks/cooking-dress-wine/~no-order/pr?p%5B%5D=sort%3Dfeatured&sid=bks%2C43p&mycracker=ch_vn_clothing_subcategory_Puma&ref=b41c8097-8efe-4acf-8919-0fa81bcb590a"
var re = new RegExp(/http:\/\/[^?]+/);
var part1 = url.match(re);
var remain = url.replace(re, '');
//alert('Part1: ' + part1);
var rf = remain.split('&');
// alert('Part2: ' + rf);
var part2 = '';
for (var i = 0; i < rf.length; i++) 
    if (rf[i].match(/(p%5B%5D|sid)=/))
        part2 += rf[i] + '&';
part2 = part2.replace(/&$/, '');
//alert(part2)
url = part1 + part2;
alert(url);

score 0 · Answer 4 · answered Jan 24 '14 at 05:10

var url ="http://www.example.com/cooks/cooking-dress-wine/~no-order/pr?p%5B%5D=sort%3Dfeatured&sid=bks%2C43p&mycracker=ch_vn_clothing_subcategory_Puma&ref=b41c8097-8efe-4acf-8919-0fa81bcb590a";
var newAddr = url.substr(22,url.length);
// newAddr == "/cooks/cooking-dress-wine/~no-order/pr?p%5B%5D=sort%3Dfeatured&sid=bks%2C43p&mycracker=ch_vn_clothing_subcategory_Puma&ref=b41c8097-8efe-4acf-8919-0fa81bcb590a"

22 is where to start slicing up the string.

url.length is how much of it to include.

This works as long as the domain name remains the same on the links.

But still I want to get rid of &mycracker=ch_vn_clothing_subcategory_Puma&ref=b41c8097-8efe-4acf-8919-0fa81bcb590a and I the url above is just an indicative url, same script I would Like to use in all scenarios. — John Doe, Jan 24 '14 at 05:13

How to parse a url using Javascript and Regular Expression?

4 Answers4