1

I am facing URI encoding problem with non-ascii chars like chinese and japanese. if i am giving characters like "隐私权政策", it does not seems to make proper URI. any pointers will be helpful. here is a code snipp

String path ="c:\隐私权政策.txt";

File f = new File(path);

URI uri = f.toURI();

System.out.println(uri);

uri = new URI("file", null, uri.getPath(), null, null);

System.out.println(uri);

am I missing something here? Thanks for your help.

mark
  • 11
  • 1

3 Answers3

2

I believe your compiler is attempting to treat \隐 as an escape character. It's not a valid escape, of course.

In fact, the backslash character isn't quite legal for a URI, per section 2.4.3 of RFC 2396.

Other characters are excluded because gateways and other transport agents are known to sometimes modify such characters, or they are used as delimiters.

unwise      = "{" | "}" | "|" | "\" | "^" | "[" | "]" | "`"

However, some systems (e.g. IIS) convert backslashes into forward slashes silently.

I tried to run your code but ran into a number of errors that ultimately crashed MyEclipse, so this may not be the only issue.

Pops
  • 30,199
  • 37
  • 136
  • 151
0

The problem is, that the solution based on removing "/" or "\" or "file" by hand is not universal. Here is the machine-independent solution

Community
  • 1
  • 1
Gangnus
  • 24,044
  • 16
  • 90
  • 149
0

@Lord Torgamus is right. So, I switched the backslash for a forward slash. Surround statement with try and catch (suggested by netbeans), and then it worked.

try {
        uri = new URI("file", null, uri.getPath(), null, null);
    } catch (URISyntaxException ex) {
        Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
    }

Console output:

run:
file:/c:/隐私权政策.txt
file:/c:/隐私权政策.txt
BUILD SUCCESSFUL (total time: 0 seconds)
Roger
  • 2,912
  • 2
  • 31
  • 39
  • @Lord Torgamus and Roger , Thanks for the help. I just tried with the character 标题 and I am again getting some new characters in the URI, even using the backward slash. (I am getting %C2%A0 extra in this case) I checked out , the byte values , itself are wrong ones. any idea ?? – mark Mar 16 '11 at 21:30