39

I'm searching for a PHP syntax highlighting engine that can be customized (i.e. I can provide my own tokenizers for new languages) and that can handle several languages simultaneously (i.e. on the same output page). This engine has to work well together with CSS classes, i.e. it should format the output by inserting <span> elements that are adorned with class attributes. Bonus points for an extensible schema.

I do not search for a client-side syntax highlighting script (JavaScript).

So far, I'm stuck with GeSHi. Unfortunately, GeSHi fails abysmally for several reasons. The main reason is that the different language files define completely different, inconsistent styles. I've worked hours trying to refactor the different language definitions down to a common denominator but since most definition files are in themselves quite bad, I'd finally like to switch.

Ideally, I'd like to have an API similar to CodeRay, Pygments or the JavaScript dp.SyntaxHighlighter.

Clarification:

I'm looking for a code highlighting software written in PHP, not for PHP (since I need to use it from inside PHP).

Community
  • 1
  • 1
Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • 1
    It would help if you listed which languages you needed the syntax highlighting to work for. – Kibbee May 04 '09 at 12:39
  • I explicitly didn't want to constrain this. The target languages shouldn't matter: any syntax highlighting engine worth its salt can be extended to handle (nearly) all languages sufficiently well. I don't search for specialized versions that only work on a tiny subset. – Konrad Rudolph May 04 '09 at 13:12
  • I wrote a PHP wrapper around Pygments library which has **tons** of languages supported. I've used it on several websites and it works great, maybe somebody will find it useful: https://github.com/igorpan/PHPygmentizator –  Feb 19 '13 at 22:52

10 Answers10

48

Since no existing tool satisfied my needs, I wrote my own. Lo and behold:

Hyperlight

Usage is extremely easy: just use

 <?php hyperlight($code, 'php'); ?>

to highlight code. Writing new language definitions is relatively easy, too – using regular expressions and a powerful but simple state machine. By the way, I still need a lot of definitions so feel free to contribute.

Manquer
  • 7,390
  • 8
  • 42
  • 69
Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • This is nice. I wish it detected methods/constructors and also types in variable declarations. – NateS May 30 '14 at 10:29
  • 4
    It wasn't hard to add. Kudos! [Here](http://pastebin.com/A7A7zrED) is my generic code highlighter which handles many languages reasonably. Tested with: Java, C#, JavaScript, AS3, C, C++, Lua – NateS May 30 '14 at 11:23
10

[I marked this answer as Community Wiki because you're specifically not looking for Javascript]

http://softwaremaniacs.org/soft/highlight/ is a PHP (plus the following list of other languages supported) syntax highlighting library:

Python, Ruby, Perl, PHP, XML, HTML, CSS, Django, Javascript, VBScript, Delphi, Java, C++, C#, Lisp, RenderMan (RSL and RIB), Maya Embedded Language, SQL, SmallTalk, Axapta, 1C, Ini, Diff, DOS .bat, Bash

It uses <span class="keyword"> style markup.

It has also been integrated in the dojo toolkit (as a dojox project: dojox.lang.highlight)

Though not the most popular way to run a webserver, strictly speaking, Javascript is not only implemented on the client-side, but there are also Server-Side Javascript engine/platform combinations too.

micahwittman
  • 12,356
  • 2
  • 32
  • 37
  • In php default function is there '); ?> Ref: http://www.tamildic.com/online-php-syntax-highlighter Working testing tool – Manikandan Aug 16 '19 at 11:43
9

I found this simple generic syntax highlighter written in PHP here and modified it a bit:

<?php

/**
 * Original => http://phoboslab.org/log/2007/08/generic-syntax-highlighting-with-regular-expressions
 * Usage => `echo SyntaxHighlight::process('source code here');`
 */

class SyntaxHighlight {
    public static function process($s) {
        $s = htmlspecialchars($s);

        // Workaround for escaped backslashes
        $s = str_replace('\\\\','\\\\<e>', $s); 

        $regexp = array(

            // Comments/Strings
            '/(
                \/\*.*?\*\/|
                \/\/.*?\n|
                \#.[^a-fA-F0-9]+?\n|
                \&lt;\!\-\-[\s\S]+\-\-\&gt;|
                (?<!\\\)&quot;.*?(?<!\\\)&quot;|
                (?<!\\\)\'(.*?)(?<!\\\)\'
            )/isex' 
            => 'self::replaceId($tokens,\'$1\')',

            // Punctuations
            '/([\-\!\%\^\*\(\)\+\|\~\=\`\{\}\[\]\:\"\'<>\?\,\.\/]+)/'
            => '<span class="P">$1</span>',

            // Numbers (also look for Hex)
            '/(?<!\w)(
                (0x|\#)[\da-f]+|
                \d+|
                \d+(px|em|cm|mm|rem|s|\%)
            )(?!\w)/ix'
            => '<span class="N">$1</span>',

            // Make the bold assumption that an
            // all uppercase word has a special meaning
            '/(?<!\w|>|\#)(
                [A-Z_0-9]{2,}
            )(?!\w)/x'
            => '<span class="D">$1</span>',

            // Keywords
            '/(?<!\w|\$|\%|\@|>)(
                and|or|xor|for|do|while|foreach|as|return|die|exit|if|then|else|
                elseif|new|delete|try|throw|catch|finally|class|function|string|
                array|object|resource|var|bool|boolean|int|integer|float|double|
                real|string|array|global|const|static|public|private|protected|
                published|extends|switch|true|false|null|void|this|self|struct|
                char|signed|unsigned|short|long
            )(?!\w|=")/ix'
            => '<span class="K">$1</span>',

            // PHP/Perl-Style Vars: $var, %var, @var
            '/(?<!\w)(
                (\$|\%|\@)(\-&gt;|\w)+
            )(?!\w)/ix'
            => '<span class="V">$1</span>'

        );

        $tokens = array(); // This array will be filled from the regexp-callback

        $s = preg_replace(array_keys($regexp), array_values($regexp), $s);

        // Paste the comments and strings back in again
        $s = str_replace(array_keys($tokens), array_values($tokens), $s);

        // Delete the "Escaped Backslash Workaround Token" (TM)
        // and replace tabs with four spaces.
        $s = str_replace(array('<e>', "\t"), array('', '    '), $s);

        return '<pre><code>' . $s . '</code></pre>';
    }

    // Regexp-Callback to replace every comment or string with a uniqid and save
    // the matched text in an array
    // This way, strings and comments will be stripped out and wont be processed
    // by the other expressions searching for keywords etc.
    private static function replaceId(&$a, $match) {
        $id = "##r" . uniqid() . "##";

        // String or Comment?
        if(substr($match, 0, 2) == '//' || substr($match, 0, 2) == '/*' || substr($match, 0, 2) == '##' || substr($match, 0, 7) == '&lt;!--') {
            $a[$id] = '<span class="C">' . $match . '</span>';
        } else {
            $a[$id] = '<span class="S">' . $match . '</span>';
        }
        return $id;
    }
}

?>

Demo: http://phpfiddle.org/lite/code/1sf-htn


Update

I just created a PHP port of my own JavaScript generic syntax highlighter here → https://github.com/taufik-nurrohman/generic-syntax-highlighter/blob/master/generic-syntax-highlighter.php

How to use:

<?php require 'generic-syntax-highlighter.php'; ?>
<pre><code><?php echo SH('&lt;div class="foo"&gt;&lt;/div&gt;'); ?></code></pre>
Taufik Nurrohman
  • 3,329
  • 24
  • 39
3

It might be worth looking at Pear_TextHighlighter (documentation)

I think it won't by default output html exactly how you want it, but it does provide extensive capabilities for customisation (i.e. you can create different renderers/parsers)

Tom Haigh
  • 57,217
  • 21
  • 114
  • 142
2

I had exactly the the same problem but as I was very short on time and needed really good code coverage I decided to write a PHP wrapper around Pygments library.

It's called PHPygmentizator. It's really simple to use. I wrote a very basic manual. As PHP is Web Development language primarily, I subordinated the structure to that fact and made it very easy to implement in almost any kind of website.

It supports configuration files and if that isn't enough and somebody needs to modify stuff in the process it also fires events.

Demo of how it works can be found on basically any post of my blog which contains source code, this one for example.

With default config you can just provide it a string in this format:

Any text here.

[pygments=javascript]
var a = function(ar1, ar2) {
    return null;
}
[/pygments]

Any text.

So it highlights code between tags (tags can be customized in configuration file) and leaves the rest untouched.

Additionally I already made a Syntax recognition library (it uses algorithm which would probably be classified as Bayesian probability) which automatically recognizes which language code block is written in and can easily be hooked to one of PHPygmentizator events to provide automatic language recognition. I will probably make it public some time this week since I need to beautify the structure a bit and write some basic documentation. If you supply it with enough "learning" data it recognizes languages amazingly well, I tested even minified javascripts and languages which have similar keywords and structures and it has never made a mistake.

  • Thanks, with automatic language recognition it will be even nicer I hope. Though building wrapper around Pygments was two-edged blade: it provides support for amazing number of languages but can't be used on websites on shared hosting. –  Feb 20 '13 at 03:02
1

Another option is to use the GPL Highlight GUI program by Andre Simon which is available for most platforms. It converts PHP (and other languages) to HTML, RTF, XML, etc. which you can then cut and paste into the page you want. This way, the processing is only done once.

The HTML is also CSS based, so you can change the style as you please.

Personally, I use dp.SyntaxHighlighter, but that uses client side Javascript, so it doesn't meet your needs. It does have a nice Windows Live plugin though which I find useful.

Rob Prouse
  • 22,161
  • 4
  • 69
  • 89
1

A little late to chime in here, but I've been working on my own PHP syntax highlighting library. It is still in its early stages, but I am using it for all code samples on my blog.

Just checked out Hyperlight. It looks pretty cool, but it is doing some pretty crazy stuff. Nested loops, processing line by line, etc. The core class is over 1000 lines of code.

If you are interested in something simple and lightweight check out Nijikodo: http://www.craigiam.com/nijikodo

Craig
  • 2,684
  • 27
  • 20
  • “The core class is over 1000 lines of code” Wait, what? Noo, I’d know that. It’s considerably shorter – I just tried to put all the core functionality (i.e. *several* classes) into one *file* to make it easier distributable. A mistake, in hindsight. Furthermore, there’s no line by line processing. It’s basically a normal lexical analyzer (only it can also handle recursive token definitions). – That said, your code looks nice too. I’ll definitely have a look at it. – Konrad Rudolph Aug 16 '10 at 07:54
1

PHP Prettify Works fine so far, And has more customization than highlight_string

Ghostff
  • 1,407
  • 3
  • 18
  • 30
1

Why not use PHP's build-in syntax highlighter?

http://php.net/manual/en/function.highlight-string.php

Gabor de Mooij
  • 2,997
  • 17
  • 22
  • 4
    Because that of course only highlights PHP code. Also, because this function *sucks*, since it’s not customisable at all. – Konrad Rudolph Jun 15 '11 at 14:04
  • oh, you can configure it via ini settings like `highlight.comment` :) – cweiske Nov 06 '13 at 20:31
  • 1
    Further, it outputs inline styles, not css classes, which is both hideous and not what the OP asked for. – Tom Auger Jan 23 '15 at 15:41
  • It doesn't really have what I would call "configuration." You can pick the colors. That's it. You have no control over things like line numbers, gutters, language being highlighted, alternate-line colors, highlighted lines, highlighted subsections, and other features that are common in many syntax highlighters. – Bob Ray Dec 05 '17 at 05:31
0

Krijn Hoetmer's PHP Highlighter provides a completely customizable PHP class to highlight PHP syntax. The HTML it generates, validates under a strict doctype, and is completely stylable with CSS.

Mathias Bynens
  • 144,855
  • 52
  • 216
  • 248