10

I have to switch from Python to C/C++.
Do you know a quick "reference tutorial" or something like that to have a reference to how to start? For example something like the Numpy and Scipy tutorials.
I have read a lot of "documentation", for example

  • C++ for dummies
  • the K&R C Programming Language
  • a lot of blog and online documentation such as: http://eli.thegreenplace.net/2010/01/11/pointers-to-arrays-in-c/,
  • http://newdata.box.sk/bx/c/
  • tons of Q&A here on StackOverflow
  • ...

but it's still not clear to me even how to do start porting to C/C++ something like:

#!/usr/bin/env python

import time
import numpy as np
import tables as tb

"""Retrieve 3D positions form 1000 files and store them in one single HDF5 file.
"""

t = time.time()

# Empty array
sample = np.array([])
sample.shape = (0,3)

# Loop over the files
for i in range(0, 1000):
  filename = "mill2sort-"+str(i)+"-extracted.h5"
  print "Doing ", filename
  # Open data file
  h5f = tb.openFile(filename, 'r')
  # Stack new data under previous data
  sample = np.vstack((sample, h5f.root.data.read()))
  h5f.close()

# Create the new file
h5 = tb.openFile("mill2sort-extracted-all", 'w')
# Save the array
h5.createArray(h5.root, 'data', sample, title='mill_2_sub_sample_all')
h5.flush()
h5.close()

print "Done in ", time.time()-t, " seconds."

in C or C++. In this example I was not even able to understand how to pass a 3D array to a function that find it's dimensions, something like

int getArrayDimensions(int* array, int *dimensions){
  *dimensions = sizeof(*array)/sizeof(array[0]);
  return 0;
}

With array being

int array[3][3][3] = ...

Thank you for any suggestion!:)

brunetto
  • 183
  • 1
  • 2
  • 7
  • 3
    Choose one: C or C++. "C/C++" doesn't exist. C++ is easier to learn than C in my opinion. –  Mar 22 '12 at 12:47
  • 3
    @daknok_t I doubt that. C++ is very productive to use, once you know it very well, but it's one of the hardest to learn. – enobayram Mar 22 '12 at 12:59
  • @daknok_t: I've yet not decided between C or C++, so I wrote "C/C++"! but which of them fits better my needs is another question! – brunetto Mar 22 '12 at 13:26
  • 1
    i've just ported a project from python to C, maybe about 10k LOC - took me 4 months and I'm *still* not clear on how to start doing it, it was pretty horrific, maybe reframe the question in such a way that it makes completely unfeasible to want to do it? – bph Mar 22 '12 at 15:12
  • @Hiett, I'm not sure about what you mean.. – brunetto Mar 22 '12 at 15:49
  • I mean be absolutely sure you have to do the port before doing it. porting from a high level language to a lower level language is not very pleasant IMHO. Investigate other options like SWIG, ctypes etc. – bph Mar 22 '12 at 16:53
  • Ah, ok! Actually what I'm doing is try do port some little codes to learn C because I need it for other projects! – brunetto Mar 22 '12 at 17:02

3 Answers3

9

OK, for that particular example:

  • you can get the time services from the standard library here
  • you can use eigen for linear algebra. It's an amazing library, I'm in love with it.
  • check here to learn how to manipulate files

While using C++, you might miss some features from python, but most of them are actually provided by the boost libraries. For instance returning multiple values from a function is very easy with boost.tuple library as in here. You can use boost::shared_ptr if you don't want to bother yourself with memory management. Or if you want to keep using python to play with your c++ classes, you can use boost.python. Boost.parameter helps you define functions with named arguments. There is also Boost.lambda for lambda functions, but if your environment supports it, you can also use C++11 to have language support for lambda functions. Boost is a gold mine, never stop digging. Just assume that it's part of the standard library. I develop C++ in many different platforms, and neither eigen nor boost has let me down yet.

Here's a good FAQ for C++ best practices. This is a very important principle that you have to keep in mind at all times, while working in C++. I extend it a bit, in my mind and think; If you're going to do something dangerous such as: Allocate memory with a raw new, or index a raw C style array, pass around raw pointers, or do static_cast (even worse reinterpret_cast) etc. They should usually happen in a class somehow dedicated to them, and the code to make sure they don't cause trouble lives very close to them, so that you can see at a glance that everything is under control.

Finally, my favourite!!! Do you want to keep using generators in C++? Here's some dark magic.

naught101
  • 18,687
  • 19
  • 90
  • 138
enobayram
  • 4,650
  • 23
  • 36
  • but that dark magic is not valid C++. – Sebastian Mach Mar 22 '12 at 13:12
  • it is, I use it on a regular basis, on a regular c++ compiler (gcc). Consider this, I've even used it on an arm based project! (Android NDK) – enobayram Mar 22 '12 at 13:14
  • I second this. Use C++ (or, even better, C++11) instead of C, since it is closer to Python. Use Boost to do all the stuff you are used to from Python, like lambda functions. Never use raw pointers - use std::string, std::vector and shared_ptr. – Gurgeh Mar 22 '12 at 13:33
  • Right, lambda functions are good to mention. – enobayram Mar 22 '12 at 13:40
  • `Each name that contains a double underscore _ _ or begins with an underscore followed by an uppercase letter (2.12) is reserved to the implementation for any use.` [global.names], and [lex.charset] does not include '$' as part of the basic character source set. Apart from these: This thing has some scoping issues: I better don't use switch-statements in the generator body, otherwise I soon get behaviour that is hard to debug, even harder because the code hides behind macros. That whole thing is jumping right into loops, exploiting not widely known features. All coders write buggy code, – Sebastian Mach Mar 22 '12 at 13:41
  • but good coders try to avoid code that is hard to debug. The code itself makes we wonder if the author is a good programmer at all: Public members to store private state? Explicitly public derivation where it is already public by default? – Sebastian Mach Mar 22 '12 at 13:44
  • You're free to replace the '$' character with whatever pleases you. Just change it in the generator.h. As for the scoping problem, the article mentions that you can not use the switch statement inside the generator body, and I think one can live with that. Compared to the elegance generators sometimes bring to your solution, these are non-issues. – enobayram Mar 22 '12 at 13:45
  • I know it looks very scary, I was also skeptical at first, but it's really very robust, and it also results in very efficient code. Don't consider the generator as a regular class and evaluate it on that basis in your mind. What's really hard to debug is the code you write in order to circumvent the lack of generators in C++. I end up writing 5-10 lines of generator code using this method, which replaces tens of lines otherwise. Which one do you think is more likely to include bugs? – enobayram Mar 22 '12 at 13:46
  • @enobayram: thanks for the parashift FAQ: I was on the site some time ago but I have not read it enough! – brunetto Mar 22 '12 at 13:49
  • Elegance? If I have heavy objects that are created each time I call gen(), it doesn't sound like elegance, if at all, this goes under "neat code". Why not just normal functors as they are common in C++, with a const-correct `boost::optional const g = gen()`? – Sebastian Mach Mar 22 '12 at 13:50
  • Nothing is created during `gen()`, it modifies int `n` at the example in the article. As for functors, if you try to make a functor behave like a generator, you have to manually manage the internal state to remember where you left off the last time. and THAT is bug prone. – enobayram Mar 22 '12 at 13:55
  • Tens of lines vs. a few? You must be kidding: `template @ struct Gen { @ optional operator() () { @ if (x != MIN) return x--; @ return optional(); @ } @ private: @ int x = MAX; @ };` (replace '@' with line breaks), then `for (Gen<0,10> gen; const auto &x = gen();)`. It has the same state-complexity, and is not more bug prone than this macro-hickhack. Plus you are not limited to a subset of C++, and if there is a bug, even intermediate programmers can debug. – Sebastian Mach Mar 22 '12 at 13:59
  • This is a toy example, here, your x happens to remember where you left off, so you don't have to manage that. Sometimes you need to do not so regular operations everytime your generator is called, and with a functor, you'd have to manage the code to resume from where you've left off. – enobayram Mar 22 '12 at 14:03
  • I think that generator thing is a toy code that doesn't add much. It is interesting, as is Duff's device. And, using inheritance and `std::function` and/or lambda functions, it is possible to write a clean generator class without any manual management and without evil evil macros that pollute my identifier-space and harm my current or future code. And if you are talking about non-toy-code, what is your typical approach debugging a generator function? Is it easy? Or is it by guesses? I bet you are _not_ using your debugger, are you? – Sebastian Mach Mar 22 '12 at 14:08
  • You would have to see it to believe how well it plays with the debugger. The debugger behaves exactly as it would if all those pseudo keywords were language features. As for the generator code being toy, I've used it for many sophisticated algorithms. I don't think we need to argue how useful generators are in general. You just have to use them to know how much you need them. Just like anything else in programming. – enobayram Mar 22 '12 at 14:11
  • There's one bug-prone point, and I think the only one. If you declare local variables in the generator body, you will have surprising results. But I think you can keep this in mind while writing 10-20-30 lines of code. This is not new to C++. – enobayram Mar 22 '12 at 14:16
  • Other bug spots of that incarnation: I need a symbol that has greater scope than the loop I need it for, because the result is not returned but written to by reference; this increases cyclomatic complexity. This implies another bug point: It forces me to write const-incorrect code; increases complexity again. Also, how can I use values that are not default-constructible? What if I have another bug in those 30 lines that is hard to spot? E.g., when I `throw`, I get `#6 0x000000000040074a in descent::operator()(int&) ()`. Good, except I must now interpret the macros, which is non-trivial. – Sebastian Mach Mar 22 '12 at 14:34
  • - I don't understand your point regarding const-correctness, why do you need what you yield to be const? even then, right at the beginning of the loop body, you could define a const reference to the yielded value and use that in the loop body. Or if you wan't to return syntactically const stuff, yield them with `const T *` pointers. – enobayram Mar 22 '12 at 14:46
  • - Re using not default constructible values: If you mean having such variables in the code, you can define a constructor for your generator (since in reality it's a class) before `$emit`, and initialize whatever in an initialization list. If you're talking about yielding not default constructible types, than in the loop header, initialize it ("n" in the example) however you want. – enobayram Mar 22 '12 at 14:48
  • - Re what you get when you throw. Come on, it says `descent` in the message... – enobayram Mar 22 '12 at 14:50
  • **0)** The "returned" value should be const, not "emit's" own iterator, like `const int x = foo()`. You can only say `foo(x)`, which forbids making x const. See also [this](http://www.parashift.com/c++-faq-lite/const-correctness.html). Not being able to means that the mutable state at the caller site unnecessarily increases. **1)** No, I mean using non-default-constructible classes as the "returned" value, e.g. `const File x = find_all("*.cpp")`. If `File` is not default-constructible, I can't use it with the generator. **2)** Except it does not say `yield`, but `operator()`. Evil evil macros – Sebastian Mach Mar 22 '12 at 14:59
  • And btw, I know that in reality it is a class. I have inspected and understood the code to be able to judge it ;) – Sebastian Mach Mar 22 '12 at 15:11
  • **0,1)** I see, there can be some inconveniences. But, there are ways around them, for instance using raw pointers or `unique_ptr`s etc (You don't have to keep using the pointer in the loop body, you can immediately define a f.x `const int &` in the beginning of the loop body). More importantly, cases where there are inconveniences do not stop you from using these generators in cases where there are none. **2)** Come on... :) – enobayram Mar 22 '12 at 15:13
  • **0),1)** It adds complexity, with or whithout pointers. You are not into const-correctness and code-complexity, are you? Had you ever to maintain legacy code that used too much state? The less mutable state, the more capacity there is in your brain. With C++, I am better of using the common idioms, which are, as shown by line-count not bigger than neat generators (btw: read "Exceptional C++", where Herb Sutter recommends to write straight, not neat), and which are understood by all C++ programmers. **2)** No, I don't come on. If I have to find a bug in your code, under time pressure, ... – Sebastian Mach Mar 22 '12 at 15:23
  • ... I do not want to struggle my brain with additional fluff from those macros, just to understand what is going on. And so will you, because it is only a question of time that you forget what is going on (maybe 1 year, maybe 10). I would already wonder enough that I can't declare a single variable inside the generator-loop. This thing adds so nothing, and I would forbid using it in my company because there is no sane reason not to use the common idioms; at least you have not presented a single one. – Sebastian Mach Mar 22 '12 at 15:25
  • And just in case you are inspirited enough to give a real advantage, here's yet another counter-argument: I can't use any local control structure without uncommon code or even hacks. Not only switch statements, but local for-loops, while-loops, function-calls-results, etc. etc. I could think the whole night on it, and every 10 minutes another caveat would become obvious. I somewhat doubt you ever really had 30 line long generator functions that did really local work. – Sebastian Mach Mar 22 '12 at 15:39
  • **1)** I've been accused of being a const-correctness freak at times, but as I said, in cases where this creates friction with const-correctness, you don't have to use it. **2)** I believe you don't mean complexity in the big-O sense. If you mean complexity to comprehend, it doesn't add anything if you're already used to generators and think in those terms. This is the same with the 10 years argument, if you regularly use generators, what's going on will always be as clear as any other syntactic feature. **3)** I would forbid using it in my company, for anybody but me :). – enobayram Mar 22 '12 at 15:40
  • @Gurgeh: C++ has its own lambda functions now ;) – Sebastian Mach Mar 22 '12 at 15:41
  • Everything other than switch works – enobayram Mar 22 '12 at 15:41
  • The reason why you're so much disturbed by the added state and all is that you're trying to think of the generators as just another functor with nasty states inside. You should think of a generator as a generator. It relieves your mind greatly – enobayram Mar 22 '12 at 15:43
  • I once needed to iterate through the objects that are closer to a given point than a certain distance. The objects resided in a regular 2D grid of cells, and they had position in the cell and I wanted to do this as efficiently as possible while keeping all the iteration code away from the code that actually processes those objects. Generators allowed me to create very clear code. – enobayram Mar 22 '12 at 15:47
  • **1)** It will always produce frictions, because I have to enlarge the mutable state of any function. This is a huge factor when debugging someone else code, because you have to _search all uses_ each time I see no `const` before a variable. **2)** As said, I mean cyclomatic complexity, no need for guessing. Another 10 years argument: The code runs fine. You leave the company. 10 years later, the company decides to add more features. New devs are hired. Bugs appear. Now regarding that not even you realised all the implications of that generator implementation, and it is not in any C++ book ... – Sebastian Mach Mar 22 '12 at 15:50
  • "Anything other than switch works": Wrong. And the fact you don't see it greatly emphasizes my argumentation vs. that beast. Example: `int fac = 1; for (int x=1; x<5; ++x, fac*=x); $yield(fac);`. Fail. `{ const float pi = 3.14159...f;}`. Fail. You can always only use global or member variables. With all the implications. – Sebastian Mach Mar 22 '12 at 15:55
  • About your cell-example: http://stackoverflow.com/a/9485821/76722 . No loop iteration. Just a per-field-applied lambda and a generic, re-usable algorithm. It is even better, because you can encapsulate more complex grid indexing algorithms, like [Morton-code](http://en.wikipedia.org/wiki/Z-order_curve) from the rest of the world. I think you chose a more than sub-optimal approach there / And, I have worked in C# for some years, which has generators, too. My mind therefore is already relieved. – Sebastian Mach Mar 22 '12 at 15:58
  • The generator thing is moot. Boost.Context was provisionally accepted in May 2011 and officially in January 2012, so now we can use Boost for coroutines and generators and such. See http://ok73.ok.funpic.de/boost/libs/context/doc/html/context/overview.html and http://groups.google.com/group/boost-developers-archive/browse_thread/thread/d76cfddd1b4444a6 – Gurgeh Mar 22 '12 at 16:02
  • @Gurgeh: That is a clean implementation, my points are more against that Macro-Monster. The boost one doesn't suffer most of the points I mention :) – Sebastian Mach Mar 22 '12 at 16:05
  • @phresnel "Fail." you realise all those cases fail because you're defining local variables, right? As long as you define `int fac,i` and `const float pi` before $emit, all's good. I've already said local variables are an issue, you're not pointing at something new. – enobayram Mar 22 '12 at 16:28
  • @phresnel the stackoverflow question you've linked is much simpler than my problem, I didn't simply need to access array elements. The grid cells had finite area, and I had to check whether any section of that area fell within the distance from the point. Further, I wanted to be able to keep the processing code agnostic about the shape of the grid. I could switch to a hex-grid, or even a quad-tree – enobayram Mar 22 '12 at 16:31
  • @phresnel "it's not in any C++ book" C# is not in any C++ book either. Nor is eigen matrix library etc... Regarding cyclomatic complexity, I don't want to go into details, but I don't think the generator implementation will introduce any more than you would while trying to implement the same thing while managing the functor state manually. Of course you could turn the implementation around and do something completely different to reduce cyclomatic complexity, but I'm sure you'll have other trade-offs in so doing. – enobayram Mar 22 '12 at 16:36
  • BTW. If you're so uneasy with using libraries with macros, boost is full of them as well. @Gurgeh thanks for pointing at Boost.context, I'll take a look at it. – enobayram Mar 22 '12 at 16:38
  • @enobayram: Re "Fail": That _is_ a problem. _Re_starting loops each iteration requires a hack. And again, you can't use any non-default-constructible value that depends on something inside the emit/stop-block. / About your cell-problem: I am afraid the comment section is too short. I believe you that generators are, as an abstract technique, useful. But if I had the choice between other C++-idioms and Fedoniouk's approach, I would stay with the C++-idioms; I am pretty sure, one can find equally good (or even better) methods. Generic algorithms can become quite powerful. – Sebastian Mach Mar 22 '12 at 16:47
  • There's no hack, you just define int n as a class member, not as a local variable... Re non-default-constructible, this is the second time this comes up, so I'll say the same thing. You CAN non-default construct them in the initialization list of your generator's constructor... – enobayram Mar 22 '12 at 16:50
  • @enobayram: Re "Uneasy-macros": I do not use any boost library that requires me to use macros. Re "Book": But C#-generators are described in C#-books, which is the point. Re "Cyclomatic Complexity": You are right. However, you introduce state at a larger scale. With functors or lambdas, you can make it all local to the code portion where it is used, which helps every maintainer. – Sebastian Mach Mar 22 '12 at 16:53
  • @enobayram: Re "No Hack": There is, again, I talk about __RE__-starting loops __each time__ the __emit-block is entered__. This requires a hack. The same with regards to default-construction: I target an object _local to the emit-block_, _with preconditions computed in the emit-block_. This is not possible, and initializing it in the generator-ctor does not help. – Sebastian Mach Mar 22 '12 at 17:01
  • To give an example: Your generator requires a `scoped_lock` in the emit-block. You cannot put it as a member because another, temporally interleaved portion uses the same mutex. Can't do this at all with the macro-fluff. And please remember: I am not arguing generators, but this particular implementation thereof. – Sebastian Mach Mar 22 '12 at 17:06
  • @phresnel you're right about "with preconditions computed in the emit-block", though I feel that's a very special case with generators and in such a special case (if you really insist that I have to be able to use something under all circumstances to be able to like it) you could resort to placement new (this weakened my argument didn't it :) ). Regarding loops, I'm sorry but I fail to see how `for(i=0;i<10;i++)` is so much worse than `for(int i=0;i<10;i++)`. I think we'll have to agree to disagree, as we seem to have different perspectives on what's more important in a piece of code. – enobayram Mar 22 '12 at 17:10
  • I've just seen the `scoped_lock` remark You can define the `scoped_lock` as a local variable, since you don't need to remember its state the next time. If you need to keep it locked between different calls to the generator, than that'll obviously not work, but again, very very special cases... (good example though) – enobayram Mar 22 '12 at 17:27
  • I've just noticed something very important. There's no risk of introducing silent bugs by defining local variables. The compiler will simply give a compiler error under circumstances that would otherwise create a bug. This is due to the forbidden jumps across initialisations in C++. It will still allow the local `scoped_lock` I've mentioned above. – enobayram Mar 22 '12 at 17:35
  • @enobayram: Your last remark: That's exactly what I was talking about half the time. You simply can't declare a variable with emit-block scope; as said I've studied the code you linked in every aspect :D. A scoped_lock can make sense anyways, that has less to do with the cross-jumping-thingy, but more with having exclusive access to a resource at a given point of time. After yielding, I would expect that the lock is freed. But I agree on us disagreeing ;) – Sebastian Mach Mar 22 '12 at 21:13
  • I'm glad we agreed on smt :) About that remark, I knew from the beginning that it was a logical error, but I thought forgetting about it could cause sinister bugs. I just wanted to point that there's no such risk, you'll simply get a compiler error. What's also good is; you won't get a compiler error if having a local variable is not a logical error. Now that you've spent so much time on this, I hope you give it a chance one day under controlled conditions ;) – enobayram Mar 23 '12 at 06:18
  • @enobayram: I am afarid I won't because of reasons said :) – Sebastian Mach Mar 23 '12 at 09:38
  • http://www.parashift.com/c++-faq-lite/ <- link is dead. – naught101 Aug 25 '18 at 06:46
5

This question is getting quite old, but here is a couple of references that have been useful to me:

A Transition Guide: Python to C++ (pdf)

A Brief Introduction to C++ for Python programmers (incomplete but quite good)

OrderFromChaos
  • 164
  • 2
  • 6
cedbeu
  • 1,919
  • 14
  • 24
5

Alright, lets just start with C for now.

void readH5Data(FILE *file, int ***sample);   // this is for you to implement
void writeH5Data(FILE *file, int ***sample);  // this is for you to implement

int main(int argc, const char *argv[])
{
#define width 3
#define height 3
#define depth 3

    time_t t = time(NULL);

    int ***sample = calloc(width, sizeof(*sample));

    for (int i = 0; i < width; i++)
    {
        sample[i] = calloc(height, sizeof(**sample));
        for (int j = 0; j < height; j++)
        {
            sample[i][j] = calloc(depth, sizeof(***sample));
        }
    }

    for (int i = 0; i < 1000; i++)
    {
        char *filename[64];
        sprintf(filename, "mill2sort-%i-extracted.h5", i);

        // open the file
        FILE *filePtr = fopen(filename, "r");

        if (filePtr == NULL || ferror(filePtr))
        {
            fprintf(stderr, "%s\n", strerror(errno));
            exit(EXIT_FAILURE);
        }
        readH5Data(filePtr, sample);

        fclose(filePtr);
    }

    char filename[] = "mill2sort-extracted-all";

    FILE *writeFile = fopen(filename, "w");

    if (writeFile == NULL || ferror(writeFile))
    {
        fprintf(stderr, "%s\n", strerror(errno));
        exit(EXIT_FAILURE);
    }

    writeH5Data(writeFile, sample);

    fflush(writeFile);
    fclose(writeFile);

    printf("Done in %lli seconds\n", (long long int) (time(NULL) - t));

    for (int i = 0; i < width; i++)
    {
        for (int j = 0; j < width; j++)
        {
             free(sample[i][j]);
        }

        free(sample[i]);
    }

    free(sample);
}

As long as you remember that your array is 3x3x3, you should have no problems overstepping the bounds in your 'writeH5Data' method.

Richard J. Ross III
  • 55,009
  • 24
  • 135
  • 201
  • 1
    I would just note that Richard is calling `calloc()` three separate times: `sample` is an array of pointers (outer `calloc`), each of which points to an array of pointers (middle `calloc`), each of which points to an array of `int`s (inner `calloc`). When I first started with C I had paper everywhere where I sketched out blocks of memory to work out what pointed to what. Also, if you really need the size of an array, in most cases you'll have to keep up with it yourself when you create the array. C doesn't know where arrays "end," in general, so `sizeof()` doesn't work. – Sam Britt Mar 22 '12 at 13:01
  • 2
    Thank you for the answer, I'll study it, but I was searching for a "reference", something like https://www.cfa.harvard.edu/~jbattat/computer/python/science/idl-numpy.html. Ok, not really a "table of conversion" but something like "in C/C++ is good practice to manage arrays in this way, to pass them in this way and if you need to do you should do this"! – brunetto Mar 22 '12 at 13:22
  • 2
    To be clearer, I'm searching for a "numerical reference" (but not "numerical recipies"). After I have read a lot of theory, I still don't know how to start to be productive, I have no references on how to do simple, "standard", and everyday things like manipulate multidimensional arrays and pass them to functions. After so many years of C/C++ programming in the world I think there would be a "standard" shared knowledge that suggest "To do you should do this" so I haven't to reinvent the wheel, arrays manipulations and so on. I am searching for something like this!:) – brunetto Mar 22 '12 at 13:44