2

I want to use a small C wrapper to access a so called LSF-API. LSF is the "load sharing facility" which is something like a platform to dispatch computing jobs on various machines (created by IBM).

I figured out how to do basic job submitting through populating a data structure and passing it to LSF. But when I tried to specify this data structure with some more of the defined attributes, I came to a problem which is connected to basic C issues.

I want to specify a list of hostnames, where the job should be dispatched to. According to the API, this is done with those two fields:

char **     askedHosts -> The array of names of invoker specified candidate hosts. The number of hosts is given by numAskedHosts.
int     numAskedHosts  -> length of the previous array

This char ** makes my head aching:
I assumed, that I need to create an array with my hostnames as strings, specify the amount of them and pass this somehow to my data structure:

char *myHostArray[] = {"hostname_1","hostname_2","hostname_3"};
int numberOfMyHosts = 3;
myDatastructure.askedHosts = myHostArray;
myDatastructure.numAskedHosts  = 3;

But whatever I try, it doesn't work. The depicted variant is the only one, where the compilation is at least successful and I do not get an "Segmentation fault" at runtime. But in the end the information seems not to be passed correctly since it has no effects on the job dispatching.

I guess I am messing up something with the pointers. Do you have any idea how I can pass this array correctly? I tried lots of variations but I was not successful after hours.

Do you know what I could be doing wrong here?

By the way - the API reference can be found here (I am talking about the "submit"-data structure):
https://www.ibm.com/support/knowledgecenter/en/SSWRJV_10.1.0/api_reference/index.html

and0r
  • 305
  • 1
  • 4
  • 13
  • 1
    It seems correct... are you sure there are no problems somewhere else? Are you `memset`ing the `myDataStructure` on initialization to zero (or doing ` = {0};`)? – xanatos Mar 17 '17 at 14:22
  • 1
    The code looks OK; the problem is, presumably, something else. Have you checked the error indicators from the API call? What do they tell you about the problem? – Jonathan Leffler Mar 17 '17 at 14:24
  • I agree with everybody else. Having looked at the docs you linked to, I'd say you are doing the right thing with respect to these two struct members. The problem must be elsewhere. – JeremyP Mar 17 '17 at 14:26
  • Are these calls asynchronous, by any chance ? Maybe you need to make sure that the supplied pointer parameters do not go out of scope before the call completes ? – Paul R Mar 17 '17 at 14:28
  • @JonathanLeffler: This is only an optional parameter, I don't receive any error indicator here. I only see, that the job dispatching is done randomly and not only to the machines mentioned. There is also a commandline version of submitting jobs. Something similar to "askedHost" can be passed with bsub using the optional parameter -m. If I specify a machine there, everything works. Jobs end up only there... – and0r Mar 17 '17 at 14:30
  • @PaulR No, this is done very basic and synchronous. At least I didn't do anything to create asynchronious behaviour – and0r Mar 17 '17 at 14:31
  • You should add that explanation to the question — that's where it is needed. Have you specified the machine names appropriately? Are they the correct FQDNs? Or are you supposed to specify simple names? Are you sure they're spelled correctly? Are you sure those names are accessible if you're working in a cloud environment? – Jonathan Leffler Mar 17 '17 at 14:33
  • For the sake of debugging print out the values in question right before calling the API function. Or just use a debugger. – alk Mar 17 '17 at 14:33
  • @xanatos: I didn't try that, but all other parameters (about 10) I am passing are read and interpreted correctly.... (but those are only integers or simple strings (e. g. char*)) – and0r Mar 17 '17 at 14:33
  • From this "*I only see, that the job dispatching is done randomly and not only to the machines mentioned*" I'd conclude `numAskedHosts` is received as `0` by the API function. – alk Mar 17 '17 at 14:36
  • @JonathanLeffler I am working on the same submission host here: Submitting via comandline works. Submiting the exact same machine name via API does not work. Behaviour should be the same according to IBM-documentaton (which is very little available and the developler page is down atm) – and0r Mar 17 '17 at 14:36
  • @alk why would you conclude that? – and0r Mar 17 '17 at 14:38
  • Because the docs (you link) say so: "*If numAskedHosts is 0, all qualified hosts will be considered.*" – alk Mar 17 '17 at 14:39
  • Exasperating. Are you at the level of desperation that tries to track the system calls made for the command-line version and the API version to see whether the same information is sent? It may be hard if the connection is encrypted, which it might be. It is going to be very hard for anyone outside your working environment to guess what's going wrong. There's a remote possibility that someone's encountered the problem before, but otherwise, we're into guesswork. Can you get any help from IBM Technical Support? Have you tried IBM community support for the API? Good luck! – Jonathan Leffler Mar 17 '17 at 14:40
  • Have you checked whether there's a flag bit you have to set somewhere in the API to indicate that the valid hosts list should be used? – Jonathan Leffler Mar 17 '17 at 14:42
  • @JonathanLeffler I will check the flags on Monday (there are a few, yes). Then I will try to contact IBM support. Thanks for all inputs. I understand that it is not possible to provide a real solution for you guys here. Thanks for confirming, that I am probably not too wrong here. – and0r Mar 17 '17 at 14:44
  • @alk that is true, but it also could mean, that the hostname strings are not read in properly or not processed at al. The latter is more likely since passing of other integeres worked fine. – and0r Mar 17 '17 at 14:45

2 Answers2

2

For any optional parameter a request must tell whether this option is used or not. The options in the submit structure indeed has the flag

#define SUB_HOST   0x04
Flag to indicate numAskedHosts parameter has data.

Equivalent to bsub -m command line option existence.

You must do

    submit.options |= SUB_HOST;
user58697
  • 7,808
  • 1
  • 14
  • 28
  • @and0r, did you try this. The `lsb_submit()` library code will disregard those fields if the SUB_HOST option isn't set. You can see the library code in [Platform Lava](https://github.com/openlava/Lava/blob/master/lsbatch/lib/lsb.sub.c#L339-L343). – Michael Closson Mar 18 '17 at 20:00
  • @MichaelClosson now I tried it: ...and it works like a charm! Great! Thanks a lot for that!!! If I look up this option parameter in the API-documentation it is stating that it belongs to the numAskedHosts parameter. But if I only look up the submit datastructure (what I did) I didn't find the link to that option parameter...or was I too blind? I guess I have not enough xp to work correctly with APIs.. – and0r Mar 20 '17 at 12:46
  • There is some discussion about lsb_submit() on page 54 of the [LSF programmers guide](http://www.ccs.miami.edu/hpc/lsf/9.1.1/print/lsf_programmer.pdf). I think that the purpose of having flags for optional features is to distinguish cases where the default value isn't zero or null. E.g., `SUB2_JOB_PRIORITY` defaults to `-1`. – Michael Closson Mar 20 '17 at 18:36
  • @MichaelClosson The table on page 57 in your linked programmers guide shows a good synopsis regarding commandline parameter (busb), field in the submit-datastructure (API) and the potential belonging option-flag. Thanks – and0r Mar 21 '17 at 07:04
0

I hope this example will help you:

#include <stdio.h>    

void func(char **args, int n) {    
    int i;    
    for (i=0; i<n; i++)    
        printf("%s\n", args[i]);    
}    

int main(void) {    
    char *askedHosts[] = {"hostname_1","hostname_2","hostname_3"};    
    int numOfHosts = 3;    

    func(askedHosts, numOfHosts);    
}  
aicastell
  • 2,182
  • 2
  • 21
  • 33
  • 3
    Your function "func" shows how the API would interpret the input. In the end it confirms, that my way of "input" should will be handled correctly in general which is what others also stated. But it does not really help me here :) . Thanks anyway for confirming – and0r Mar 17 '17 at 14:42