2

I have the Problem framed in the below Program i am trying to extract values from URL strings like the values that come after a=, symbol=, uid=, cid=, o=.

What is the best way to extract these values for the sample URLS shown in the array declared in the program.

I want to keep the time taken to parse shown in the output statement of program to have a minimal possible value.

package com.xyz.urlagent;

import java.util.Date;
import java.util.Random;

public class UrlExtract {

public static String[] urlArray = {"https://example.com/grid/p/login?cid=testcidcombo4&uid=testuidcombo4&a=testadcodecombo4&o=testoffercodecombo4",
                    "https://example.com/grid/p/site#r=jPage/https://research-example.com/grid/wwws/research/stocks/earnings?c_name=hvfkhfk_VENDOR&symbol=IBM",
                    "https://example.com/grid/p/login?a=testadcode3",
                    "https://example.com/grid/p/site#r=jPage/https://research-example.com/grid/wwws/fixedIncome/bondTicker.asp?c_name=_jhcjhfhyjkh_VENDOR&Extra=",
                    "https://example.com/grid/p/site#r=jPage/https://example.com/grid/wwws/ideas/overview/overview.asp?YYY600_4TasO+9+jFhYnkq2U5YXohiZ9qsMKu/jUh6HR8N5EWKAOlRWVhC18/dapBTvnqGaqgNGUMvWP3EfysyWRfCNYsqUFBc1pxuB8/ho+4G2BBo=&c_name=khhfjkuk_VENDOR",
                    "https://example.com/grid/p/site#r=jPage/https://research-example.com/grid/wwws/research/stocks/earnings?symbol=AAPL&c_name=jkvkjgljlj_VENDOR",
                    "https://example.com/grid/p/login?CID=testcid1"};
public static int numurl = 2000;
public static Random rand = new Random(System.currentTimeMillis());

public static void main(String[] args) {
    Date StartDate= new Date();
    for(int i=0; i<numurl;i++){    
           String SampleURL = urlArray[rand.nextInt(urlArray.length)];

           ////////////############ CODE To Extract symbol Values from URL(value after symbol=)

           ////////////############ CODE To Extract UID Values from URL(value after uid=)

           ////////////############ CODE To Extract CID Values from URL(value after cid=)

           ////////////############ CODE To Extract O Values from URL(value after o=)

           ////////////############ CODE To Extract A Values from URL(value after a=)

           System.out.println("Values extracted from Sample URL: "+ "(Extracted Values are printed HERE)");                
        }   
    Date EndDate= new Date();
    long diff = (EndDate.getTime()-StartDate.getTime())/(1000%60);
    System.out.println("Time taken to parse "+numurl+ " url's is: "+diff+ " seconds.");
    }

}
RAJESH
  • 404
  • 6
  • 18
  • 2
    Frankly, I would prefer correctness here over efficiency. With all the encoding rules for URL parameters this is a non-trivial task to do correctly. Use a good library. – Henry Aug 01 '16 at 18:38
  • You can look here: http://stackoverflow.com/a/31600846/1475228 – Pritam Banerjee Aug 01 '16 at 18:42
  • Thanks Henry that is most important, I am doing this on high volume of data per second so i am also looking for quickest possible solution. – RAJESH Aug 01 '16 at 18:42
  • Hi Pritam HttpRequestParser shown in link is some thing different. – RAJESH Aug 01 '16 at 18:56

1 Answers1

0

The URI class and URLDecoder class are designed to do what you want:

URI uri = URI.create(sampleURL);
String query = uri.getRawQuery();
String[] nameValuePairs = query.split("&");
for (String nameValuePair : nameValuePairs) {
    String nameAndValue = nameValuePair.split("=", 2);
    String name = URLDecoder.decode(nameAndValue[0], "UTF-8");
    String value = URLDecoder.decode(nameAndValue[1], "UTF-8");

    System.out.printf("Found query parameter \"%s\" with value \"%s\"%n",
        name, value);
}

Be aware that some of your example Strings are not valid URLs at all, because the #r comes before the query separator, ?. The structure of a URI is documented in the URI class documentation and in the RFC that defines the structure of a URI, RFC 3986.

Community
  • 1
  • 1
VGR
  • 40,506
  • 4
  • 48
  • 63