0

I've been looking into IR code which is specified using SSA- especially, generating LLVM IR in this form. However, I'm confused about whether or not this can be effective when presented with a type which has non-trivial copy semantics. For example,

void f() {
    std::string s = "Too long for you, short string optimization!";
    std::string s1 = s + " Also, goodbye SSA.";
    some_other_function(s1);
}

In this SSA form, at least at the most obvious level, this results in a nasty mess of copies (even for C++). Can optimizers such as LLVM's actually optimize this case accurately? Is SSA viable for use even for types with non-trivial copy/assignment/etc semantics?

Edit: The question is that if I use an LLVM SSA register to represent a complex type (in this case, std:string), here represented by manually making it SSA, can LLVM automatically translate this into a mutating += call in the underlying assembly in the general case and avoid a nasty copy?

Puppy
  • 144,682
  • 38
  • 256
  • 465
  • Did you try to see if LLVM can optimize this? – rubenvb Jun 09 '12 at 13:42
  • @rubenvb: I don't have an LLVM-able compiler. – Puppy Jun 09 '12 at 13:43
  • 2
    I'm not sure I understand the question? Are you asking whether it would be efficient to write C++ code with single-assignment variables? Or whether LLVM code can be efficient since LLVM uses SSA? Note that in LLVM it's the registers that are single assignment, not memory locations. – sepp2k Jun 09 '12 at 13:43
  • 1
    There's an [online demo of Clang](http://llvm.org/demo/index.cgi). –  Jun 09 '12 at 13:55
  • 1
    I'm still not sure we're on the same page. So we're considering a case where we have an LLVM complex type to represent strings. So that type would contain an int for the size and a pointer for the contents, right? You make it sound as if copying a value of that type from one register to another would involve copying the memory that the pointer points to (like the copy constructor in C++ would). It wouldn't. Assigning the string to another register would only copy the int and the pointer. The pointed-to memory would only be copied if you copied it yourself. – sepp2k Jun 09 '12 at 14:05
  • @sepp2k: Which is exactly the mandated requirements for mutating the string- that the memory pointed to must be copied. Blindly copying the int and the pointer would only work for a reference to that value. – Puppy Jun 09 '12 at 14:14
  • 1
    @DeadMG I'm only trying to understand what the LLVM code you're asking about would look like. If your LLVM code contains a call to operator+, then operator+ will be called. LLVM won't optimize your call to + to a call to += because LLVM knows nothing about the semantics of + and +=. That kind of logical optimization would happen *before* the LLVM code is generated. – sepp2k Jun 09 '12 at 14:30
  • @sepp2k: Well, that can't be true because otherwise SSA code for any kind of type would be vastly too slow to use. – Puppy Jun 10 '12 at 18:23
  • @DeadMG I don't know why you think that and I think that's the reason why I still don't understand your question. Just because LLVM registers are single-assignment doesn't mean that your C++ code needs to use single assignment variables when using an LLVM based C++ compiler. Or that normal C++ code would somehow compile to LLVM code that behaves like the C++ code you posted. If you write C++ code that uses `+=` to concatenate strings, that code will be perfectly efficient when compiled to LLVM. If you write C++ code like the one in your post, it won't be unless the C++ compiler optimizes it. – sepp2k Jun 10 '12 at 18:39

1 Answers1

1

SSA means single static assignment. It's a way of dealing with value semantics as applied to registers. Each object is the result of exactly one machine instruction.

LLVM provides a generic "move" instruction, which is useful because there are many instructions across the spectrum of architectures that move 8, 32, N bytes. It also provides structured datatypes and arrays, because it is useful to hoist such things to registers, and they can be used to represent wacky high-level machine constructs. The intent is not to model OOP.

Potatoswatter
  • 134,909
  • 25
  • 265
  • 421