2

Basically, I'm getting a file path from a string inside of a CSV file. However, for some reason, the program generating the CSV file removes the colon from the string, so I end up with a file path that does not work inside of Java. The typical output is /x/Rest/Of/Path where x is the drive letter, but may occasionally be x/ instead of /x/. Basically, I need to add a colon after the drive letter if there isn't one already; changing either /x/ or x/ to x:/. I'm sure this is mostly done through regex, but I'm still trying to figure out the basics of regex myself, so I'm not sure how to write it. Thanks in advance for any help.

DGolberg
  • 2,109
  • 4
  • 23
  • 31

1 Answers1

2

Here, try this, and study it to learn how it works:

String path = "/C/Rest/Of/Path";
Pattern p = Pattern.compile("^(/?[CDEFGH])/");
Matcher m = p.matcher(path);
String pathWithColon = m.replaceAll("$1:/");

Here's a guide:

  1. The ^ is known as an anchor. It matches the very beginning of the string. Without it, this regex would also match /foo/C/Rest/Of/Path, and we don't want that.
  2. The ? can mean various things, depending on where it appears. If it doesn't immediately follow an open-parenthesis (, doesn't immediately follow a quantifier *, +, another ?, {n}, {m,n}, doesn't appear inside a character class [], and isn't escaped \?, then it is a quantifer, meaning, "0 or 1 of the previous entity," in this case, the /. Think of it as the "optional" operator.
  3. The [CDEFGH] is known as a character class. It means, "Any one of these characters." You can negate a character class like so: [^CDEFGH]; this would mean, "Any one character but not these." If you would like to accept any capital letter, then you could use a range: [A-Z]. If you would like to accept any letter, then: [a-zA-Z].
  4. The parentheses surrounding most of the regex is known as a capturing group or capture group. It "saves" whatever's "caught" in between.
  5. During replacement, you can refer to "saved" (captured) groups by $1, $2, $3, and so on. (So, you can capture more than one group; each capturing group is numbered by the order of its opening parenthesis.) In the above example, note that I captured the /? as well, so if the slash existed, then it would exist in the output too, and if not, then not.

Happy learning!

EDIT

I should have exemplified a simpler approach to start. My apologies. This will do as well:

String path = "/C/Rest/Of/Path";
path = path.replaceAll("^(/?[CDEFGH])/", "$1:/");

The use of a compiled pattern only adds to efficiency. For example, if you were going to replace an array of 10,000 paths, you'd compile the pattern once, then use the matcher to replace per path in a loop. (Without compiling, the engine ends up having to parse the pattern from scratch for each path encountered.)

Andrew Cheong
  • 29,362
  • 15
  • 90
  • 145
  • Wow, glad I asked. Forget regex, I wasn't even familiar with the `Matcher` part yet. I do have a better understanding of the regex now, just not sure what the () in it do. I know ^ specifies the beginning of the string and the /? means "if the first / exists" right? then the [] are for finding one of the contained characters before the other /. – DGolberg Apr 30 '13 at 20:30
  • 1
    @DGolberg - I edited my answer with an explanation of the parts. Please feel free to comment again if anything is unclear. – Andrew Cheong Apr 30 '13 at 20:31
  • Nice, thanks for the notes! They were much more helpful than many of the other sites I've visited regarding regex! I had originally tried to do something using `.replace();` but quickly realized I would have to use something else, such as `.replaceall();` but was unfamiliar with it. My experience up until now was primarily with replacing a file format tag at the end of a file name or checking for a specific word within a string (`.contains()` basically). – DGolberg Apr 30 '13 at 20:38
  • Based on your last edit, the original answer is probably more correct for my situation anyway, but the second instance will come in handy as well. I'm basically loading a list of completed project root folders to an array, then cycling through them to get the list of their files for upload. – DGolberg Apr 30 '13 at 20:41
  • @DGolberg - I see; sounds like you were on the right path. Regex looks easy in hindsight but can be daunting before the first step. Good luck with the rest! – Andrew Cheong Apr 30 '13 at 20:42
  • Certainly is! Thanks again for the help! – DGolberg Apr 30 '13 at 20:44