6

I have a string with markdown syntax in it, and I want to be able to find markdown syntax for headings, i.e h1 = #, h2 = ## etc etc.

I know that whenever I find a heading, it is at the start of the line. I also know there can only be one heading per line. So for example, "###This is a heading" would match true for my h3 pattern, but not for my h2 or h1 patterns. This is my code so far:

h1 = Pattern.compile("(?<!\\#)^\\#(\\b)*");
h2 = Pattern.compile("(?<!\\#)^\\#{2}(\\b)*");
h3 = Pattern.compile("(?<!\\#)^\\#{3}(\\b)*");
h4 = Pattern.compile("(?<!\\#)^\\#{4}(\\b)*");
h5 = Pattern.compile("(?<!\\#)^\\#{5}(\\b)*");
h6 = Pattern.compile("(?<!\\#)^\\#{6}(\\b)*");

Whenever I use \\#, my compiler (IntelliJ) tells me: "Redundant character escape". It does that whenever I use \\#. As far as I know, # should not be a special character in regex, so escaping it with two backslashes should allow me to use it.

When I find a match, I want to surrond the entire match with bold HTML-tags, like this: "###Heading", but for some reason it's not working

//check for heading 6
Matcher match = h6.matcher(tmp);
StringBuffer sb = new StringBuffer();
while (match.find()) {
    match.appendReplacement(sb, "<b>" + match.group(0) + "</b>");
}
match.appendTail(sb);
tmp = sb.toString();

EDIT

So I have to seperately look at each heading, I can't look at heading 1-6 in the same pattern (this has to do with other parts of my program that uses the same pattern). What I know so far:

  • If there is a heading in the string, it is at the start.
  • If it starts with a heading, the entire string that follows is considered a heading, until the user presses Enter.
  • If I have "## This a heading", then it must match true for h2, but false for h1.
  • When I find my match, this "## This a heading" becomes this "## This a heading.
halfer
  • 19,824
  • 17
  • 99
  • 186
Kaffemakarn
  • 119
  • 1
  • 6
  • You do not have to escape `#`. You do not even need the `Matcher#appendReplacement` here. You may use `"(?<!#)#{6}\\b"`, and then use a simple `tmp = tmp.replaceAll("(?<!#)#{6}\\b", "$0")` – Wiktor Stribiżew May 22 '17 at 08:59
  • @WiktorStribiżew I tried your solution, but the problem is that the match only returns the #:s, and not the text that follows after – Kaffemakarn May 22 '17 at 09:17
  • If you need to match **lines starting with those `#` sequences**, see my updated answer. Always add new details to the question itself, and not to just comments. – Wiktor Stribiżew May 22 '17 at 09:51
  • 1
    @WiktorStribiżew Sorry, kinda new to this. Taking a look at your answer now. Also, question has been updated :) – Kaffemakarn May 22 '17 at 09:55
  • Good, I upvoted it because it is a good question showing effort. And now, it is really much clearer. – Wiktor Stribiżew May 22 '17 at 09:56

2 Answers2

6

There is no need to escape # since it is not a special regex metacharacter. Also, the ^ is the string start anchor, so all the lookbehinds in your patterns are redundant as they always return true (since there is no character before the beginning of a string).

You seem to want to match a specified number of # before a word char. Use

String s = "###### Heading6 Something here\r\n" +
           "###### More text \r\n" +
          "###Heading 3 text";
Matcher m = Pattern.compile("(?m)^#{6}(?!#)(.*)").matcher(s);
String result = m.replaceAll("<b>$1</b>");
System.out.println(result);

See the Java demo

Result:

<b> Heading6 Something here</b>
<b> More text </b>
###Heading 3 text

Details:

  • (?m) - now, ^ matches start of a line
  • ^ - start of a line
  • #{6}(?!#) - exactly 6 # symbols
  • (.*) - Group 1: 0+ chars other than a line break up to the line end.

Thus, your regex definitions will look like

h1 = Pattern.compile("(?m)^#(?!#)(.*)");
h2 = Pattern.compile("(?m)^#{2}(?!#)(.*)");
h3 = Pattern.compile("(?m)^#{3}(?!#)(.*)");
h4 = Pattern.compile("(?m)^#{4}(?!#)(.*)");
h5 = Pattern.compile("(?m)^#{5}(?!#)(.*)");
h6 = Pattern.compile("(?m)^#{6}(?!#)(.*)");
Graham
  • 7,431
  • 18
  • 59
  • 84
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
5

You can try this:

^(#{1,6}\s*[\S]+)

As you have mentioned that heading comes only at the start of a line thus you don't need look behind.

UPDATE: If you want to bold the full line that starts with heading then you can try this:

^(#{1,6}.*)

And replace by:

<b>$1</b>

Regex Demo

Sample Java source:

final String regex = "^(#{1,6}\\s*[\\S]+)";
final String string = "#heading 1 \n"
     + "bla bla bla\n"
     + "### heading 3 djdjdj\n"
     + "bla bla bla\n"
     + "## heading 2 bal;kasddfas\n"
     + "fbla bla bla";
final String subst = "<b>$1</b>";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
final String result = matcher.replaceAll(subst);
System.out.println(result);

Run java source

Mustofa Rizwan
  • 10,215
  • 2
  • 28
  • 43
  • Thank you! The only problem is that a heaidng could be more than one word, I guess I would want to check for #:s and then get all the text that follows, until the end. Do you have a suggestion as how to tweak your solution? I thought maybe \\b, but that only gave me the #:s – Kaffemakarn May 22 '17 at 09:24
  • so you want the full line ? – Mustofa Rizwan May 22 '17 at 09:27
  • Yes, if a # is written, then everything that follows will be included as the heading, until the user presses enter. – Kaffemakarn May 22 '17 at 09:31
  • Your solution works excellent. My problem is that I have to do a seperate check for every heading (h1, h2...), because I use the pattern in other parts of my program, so it's easier that way. Right now, if I have "## Some text here", it matches true for both h1 and h2, but only h2 should be true. I'm building of off your solution, but haven't gotten it to work as I want yet. – Kaffemakarn May 22 '17 at 09:45
  • @Kaffemakarn: Is *If you want to bold the full line* true? Please add the details to the *question*. – Wiktor Stribiżew May 22 '17 at 09:47