0

Say I want to write a small compiler in C which generates PHP bytecode from a custom language of choice (typoscript). How would I do that? Does PHP offer an API? Or would I need to hack parts of the default PHP compiler?

Edit: On a more general note, instead of writing a parser in PHP, could I write it in C and produce PHP opcode (so it would be compatible with other PHP code)?

Edit 2: This could be done as an extension to PHP, so I will look into that.

Olle Härstedt
  • 3,799
  • 1
  • 24
  • 57
  • Is it worth, to invest big effort for little outcome? Question kinda unclear to me. Let PHP generate the opcode (natively) and cache the opcode (default in PHP 5.5 with opcache). Or look at HHVM/PHPNG etc, those compile PHP to C. – Daniel W. Aug 22 '14 at 13:51
  • typoscript is parsed extensively in TYPO3, and its specification is small. I think it would be an interesting project to write the typoscript parser in C instead of PHP. – Olle Härstedt Aug 22 '14 at 13:53
  • TYPO3 has its own cache system, but often the cache cannot be used because of dynamic content. – Olle Härstedt Aug 22 '14 at 13:59
  • The PHP opcache only changes when the script itself changes. – Daniel W. Aug 22 '14 at 13:59
  • TYPO3 docs say that typoscript is PHP itself. Therefore, it gets "translated" to PHP at some point, gets lexed, executed and cached to OPCache (APC if you're into old stuff). Why would you need to write a C parser that does something that exists? – N.B. Aug 22 '14 at 14:02
  • @N.B. Because a parser in C would be 30 times faster than a parser written in PHP. – Olle Härstedt Aug 22 '14 at 14:34
  • 1
    So, we have this typoscript that's just a config language for TYPO3, which gets parsed into PHP. PHP gets parsed by a program written in C. The same C program executes opcodes that it parsed and caches them. The C "program" is known as ZendEngine more or less. Now we have the cached version of the entire thing. Tell me again, where does your C compiler get into play? And why would anyone waste time writing something 30 times faster if the "slow" thing executes it within a millisecond or so? Even if it took 20 minutes to do it, you can have the cached version anyway. – N.B. Aug 22 '14 at 15:10
  • @N.B. The same reason `xml_parser_create` and such are implemented in C and not PHP. Also, it's an interesting challenge. I guess it's better implemented as a PHP extension than a stand-alone program. – Olle Härstedt Aug 22 '14 at 16:49
  • Hm, it seems you're completely missing the point here. I guess it's better that you implement it the way you want to. There is an "API" if you will, a list of opcodes is available and nothing prevents you to translate the typoscript to opcodes using a C extension / program. However, do re-read what's been written, it might save you *a lot* of time. Also the comparison of xml parser and what you want to do is about the same as a banana and Jupiter, both in terms of usefulness and usage. – N.B. Aug 22 '14 at 22:40
  • @N.B. I know what you mean, I just don't agree. I'm not after saving time, I want to learn. And the typoscript parser class is 1200 loc, so it's not huge. – Olle Härstedt Aug 23 '14 at 18:16
  • Well if you're after learning, I can recommend http://www.phpinternalsbook.com/ as a great resource, but the thing is - you can already get PHP to lex and opcode this for you. However, sometimes doing stuff like this is fun so I hope this exercise will be fun and productive for you. Good luck! – N.B. Aug 23 '14 at 18:36

2 Answers2

1

Well. The generation of bytecode is one thing, then there is the loading of the bytecode afterwards.

For loading of generated bytecode, I suggest looking at APC (the opcode cache module). It can read opcode and execute it. It should be fairly simple to adapt to read your bytecode instead. As for generation of the bytecode, i dont think you'll find anything for that outside PHP's own source tree. (Sure, you can dump the opcode out from inside the running php engine, but thats not what you're asking for)

But i have to ask what you expect to gain. Since the only PHP understands PHP bytecode, the code still need to be run inside PHP's virtual machine anyway. So you only save cpu time on the parsing and interpretation of the source. And by using APC for caching, even that "delay" is gone.

thelogix
  • 580
  • 2
  • 14
-2

You could use HipHop from Facebook (http://en.wikipedia.org/wiki/HipHop_for_PHP) to compile PHP code to binaries. However, I don't think that a conversion between Typo script and PHP currently exists.

Martin Müller
  • 2,565
  • 21
  • 32