-2

Is there any way to show another page into your page? I cannot use frames, because frame will open that page directly, I want to copy the whole page and save it into new file, then show my new file to user. I think it is better to do this using a simple URL encryption because I don't like to show the real page address. For example, I want to use below URL instead of yahoo.com: www.myDomain.com/Open.aspx?url=zbipp_dpn ... I know how to read, encrypt and decrypt URL, but my problem is I don't know how to copy that page into new page and how to show that.

EDIT: I don't know how to start research even I don't know what should I looking for. This is the reason why I am asking my question here, from experts. I need a keyword to start research!

padfoot
  • 1
  • 2
  • What exactly are you trying to accomplish? You think you need to "show another page in the page" but maybe if you tell us _why_ we can show you a better way. – John Saunders Oct 16 '14 at 03:28
  • @JohnSaunders Here, we have not access to many web pages. e.g. all blogspot and blogger weblogs. (They have their own reason for why they are doing this: because there are many weblogs in blogger and blogspot that we don't like seen by you, so we blocked all of them.) These pages are recognizing by URL. Even when the address is used in the middle of another URL like: `http://webcache.googleusercontent.com/search?q=cache:labnol.blogspot.com` So we cannot use cache, URL shortener,... The only way is encrypting the URL and open it by our own website which hosted in another country. – padfoot Oct 16 '14 at 03:47
  • Still have no idea what you're talking about – John Saunders Oct 16 '14 at 05:06
  • @JohnSaunders I'm going to make a web-proxy. I had success in some parts and think I'm still far from the end. I'll put my codes and progress in this page. – padfoot Oct 17 '14 at 19:05

2 Answers2

1

It sounds like you are trying to setup a proxy.

You could do the following:

  • Listen for requests using an HTTP handler. This can be an MVC controller, a web form (ASPX), an instance of IHttpHandler, even a raw TCP server.

  • Use the encrypted URL to determine the target website.

  • Make a request from your website to the other website. There are multiple ways to do this in .Net, including the HttpClient class.

  • Convert the response to a string.

  • (Optional) parse links in the content to point to your proxy website address instead of the real address.

  • Return the string to the caller as the body of the response. As far as their browser knows, it is the page they requested.

Disclaimer: While proxies are commonly used, there are potential implications (beyond my non-legal knowledge and advice) to presenting someone else's content under a different URL. In addition, there may be other (perhaps serious) ramifications to circumventing filtered content. Proxied content even with the modified URL may still trigger a filter.

Tim M.
  • 53,671
  • 14
  • 120
  • 163
  • Ok. Thankyou. I am just starting research and let you know about my progress for further assistance! – padfoot Oct 16 '14 at 13:48
  • I just added my steps. Please review it. Tnx – padfoot Oct 17 '14 at 20:30
  • Thank you for your suggestion, I asked my question here: [link](http://codereview.stackexchange.com/questions/67061/make-a-web-proxy-step-by-step). Which 3rd party software? Please recommend the best one! – padfoot Oct 17 '14 at 22:25
  • http://htmlagilitypack.codeplex.com/ is very popular and has been around for years. – Tim M. Oct 17 '14 at 23:11
  • I've already used that pack in my program to find links. Would you please tell me how to replace link addresses and picture locations with this library? Any idea about copy pictures from remote page to my host? Thank you – padfoot Oct 18 '14 at 01:08
  • Sorry, I saw all the manual string parsing you are doing and didn't notice that you were already using a library for the HTML. For images, you will need to find the URLs, make requests, store the results as a file on your own server, and replace the original URL with one that points to your server. – Tim M. Oct 18 '14 at 01:18
0

Well, finally I started creating a web proxy.

I decided to explain my work here for two reasons: 1) For everyone who wants to start a similar project. 2) Most parts of these codes are copied from Stack pages, I've just collected them. ;)

I need experts to correct my mistakes and help me to continue.

Here is what I did:


ASP (Default.aspx):

I put a textbox named "txtURL" to enter the web address by user.

I put a button named "btnRun" to start processing.

For now, these components are enough!


C#:

Clicking on "btnRun", makes the page redirecting to: "www.domain.com/default.aspx?URL=(xxx)" - xxx will be replaced by web page address encrypted by a function.

This is the code for btnRun_Click:

protected void btnRun_Click(object sender, EventArgs e)
    {
        if (txtURL.Text.Length == 0) return;
        if (!(txtURL.Text.ToLower().StartsWith("http://") || txtURL.Text.ToLower().StartsWith("https://")))
            txtURL.Text = "http://" + txtURL.Text;

        try
        {
            Response.Redirect("Default.aspx?URL=(" + Encrypt(txtURL.Text, mainKey) + ")", false);
        }
        catch (Exception ex)
        {
            ShowPopUpMsg(ex.Message);
        }

I'll explain "Encrypt" and "ShowPopUpMsg" functions later.

By clicking on "btnRun", this page will be refreshed and the encrypted URL will be included in the address.

Now, in "Page_Load", we should read the encrypted URL (also a condition to detect postback):

protected void Page_Load(object sender, EventArgs e)
    {
        string url = Regex.Match(HttpContext.Current.Request.Url.AbsoluteUri, @"\(([^)]*)\)").Groups[1].Value;
        if (url.Length == 0 || Page.IsPostBack) return;

From now, every code is added to "Page_Load", one after other.

Decrypt the URL and read the remote web page source-code:

try
        {
            txtURL.Text = Server.UrlDecode(Decrypt(url, mainKey));
            string TheUrl = txtURL.Text;
            string response = GetHtmlPage(TheUrl);

I'll explain "Decrypt" and "GetHtmlPage" later.

Now, we have the source-code in "response".

Next step is find the links in this source-code. Begining of the links is "href="xxx"" and xxx is the link. We must replace them with our links through the proxy:

            response = response.Replace("href =", "href=");
            response = response.Replace("href\n=", "href=");
            response = response.Replace("href\t=", "href=");

            HtmlWeb hw = new HtmlWeb();
            HtmlDocument doc = hw.Load(txtURL.Text);
            foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[@href]"))
            {
                char[] c = { ' ', '\"' };
                string s = link.OuterHtml;
                int from = s.IndexOf("href=");
                int to = SearchString(s, from, '\"');

                s = s.Substring(from + 5, to - from - 5);
                s.TrimStart(c);
                if (s.StartsWith("\"")) s = s.Remove(0, 1);

"SearchString" is a function to return the closing quotation mark of "href". I'll explain this later.

There are two kind of links:

  1. Links that refer to another domain-name. This links are begun with "http://" or "https://". We'll find them and replace the address:

                string corrected = "href=\"" + "Default.aspx?URL=(" + Encrypt(s, mainKey) + ")" + "\"";
                if ((s.ToLower().StartsWith("http://") || s.ToLower().StartsWith("https://")))
                    response = response.Replace("href=\"" + s + "\"", corrected);
    
  2. Link that refer to current domain-name. This links are begun with "/". To replace them, we should first find the domain name then the whole address:

                else
                {
                    var uri = new Uri(txtURL.Text);
                    string domain = uri.GetComponents(UriComponents.Host, UriFormat.SafeUnescaped);
                    corrected = "href=\"" + "Default.aspx?URL=(";
                    if (txtURL.Text.ToLower().StartsWith("http://")) corrected += Encrypt("http://" + domain + s, mainKey);
                    if (txtURL.Text.ToLower().StartsWith("https://")) corrected += Encrypt("https://" + domain + s, mainKey);
                    corrected += ")" + "\"";
                    response = response.Replace("href=\"" + s + "\"", corrected);
                }
    

Now, everything is done (refer to my current knowledge) and we should show the page with new links and finish "Page_Load":

            }
            Response.Write(response);                
        }
        catch (Exception ex)
        {
            ShowPopUpMsg(ex.Message);
        }
    }

Function to search in a string:

private int SearchString(string mainString, int startLocation, char charToFind)
    {
        if (startLocation < 0) return -1;
        bool next = false;
        for (int i = startLocation; i < mainString.Length; i++)
            if (mainString.Substring(i, 1) == charToFind.ToString() && next)
                return i;
            else
            {
                if (mainString.Substring(i, 1) == charToFind.ToString()) next = true;
                continue;
            }
        return -1;
    }

Function to read source-code:

private string GetHtmlPage(string URL)
        {
            String strResult;
            WebResponse objResponse;
            WebRequest objRequest = HttpWebRequest.Create(URL);
            objResponse = objRequest.GetResponse();
            using (StreamReader sr = new StreamReader(objResponse.GetResponseStream()))
            {
                strResult = sr.ReadToEnd();
                sr.Close();
            }
            return strResult;
        }

Function to show a popup message:

private void ShowPopUpMsg(string msg)
        {
            StringBuilder sb = new StringBuilder();
            sb.Append("alert('");
            sb.Append(msg.Replace("\n", "\\n").Replace("\r", "").Replace("'", "\\'"));
            sb.Append("');");
            ScriptManager.RegisterStartupScript(this.Page, this.GetType(), "showalert", sb.ToString(), true);
        }

Function to decrypt a string:

private string Decrypt(string s, string key)
        {
            try
            {
                byte[] keyArray; byte[] toEncryptArray = Convert.FromBase64String(s);
                System.Configuration.AppSettingsReader settingsReader = new System.Configuration.AppSettingsReader();
                MD5CryptoServiceProvider hashmd5 = new MD5CryptoServiceProvider();
                keyArray = hashmd5.ComputeHash(UTF8Encoding.UTF8.GetBytes(key)); hashmd5.Clear();
                TripleDESCryptoServiceProvider tdes = new TripleDESCryptoServiceProvider();
                tdes.Key = keyArray; tdes.Mode = CipherMode.ECB; tdes.Padding = PaddingMode.PKCS7;
                ICryptoTransform cTransform = tdes.CreateDecryptor();
                byte[] resultArray = cTransform.TransformFinalBlock(toEncryptArray, 0, toEncryptArray.Length);
                tdes.Clear(); return UTF8Encoding.UTF8.GetString(resultArray);
            }
            catch { return null; }
        }

Function to encrypt a string:

private string Encrypt(string s, string key)
    {
        try
        {
            byte[] keyArray; byte[] encryptArray = UTF8Encoding.UTF8.GetBytes(s);
            System.Configuration.AppSettingsReader SettingReader = new System.Configuration.AppSettingsReader();
            MD5CryptoServiceProvider Hashmd5 = new MD5CryptoServiceProvider();
            keyArray = Hashmd5.ComputeHash(UTF8Encoding.UTF8.GetBytes(key)); Hashmd5.Clear();
            TripleDESCryptoServiceProvider Tdes = new TripleDESCryptoServiceProvider();
            Tdes.Key = keyArray; Tdes.Mode = CipherMode.ECB; Tdes.Padding = PaddingMode.PKCS7;
            ICryptoTransform Ctransform = Tdes.CreateEncryptor();
            byte[] resultarray = Ctransform.TransformFinalBlock(encryptArray, 0, encryptArray.Length);
            Tdes.Clear(); return Convert.ToBase64String(resultarray, 0, resultarray.Length);
        }
        catch { return null; }
    }
padfoot
  • 1
  • 2
  • My problem with links is solved, next challenge is to copy pictures on remote page to my own host, then replace the links in source code. Any idea? Thank you – padfoot Oct 17 '14 at 20:28
  • 1
    At a high level, it looks fine. The biggest suggestion I have is to use a 3rd party HTML-parsing library to replace links. It will be more robust and help you handle other cases like image URLs. – Tim M. Oct 17 '14 at 20:34
  • However, StackOverflow isn't really intended for code review. Now that you have written up your solution, it would probably be a good question over at http://codereview.stackexchange.com. You'll get feedback on things like exception handling, naming conventions, etc. – Tim M. Oct 17 '14 at 20:36