1

This question is a continuation of my previous question.

I am working on a project which uses the LLVM YAML I/O library. This is the documentation/tutorial that I am following:

I have created a small program that would read in a yaml file into objects in memory. Then it would print those objects. Finally it would write the objects out into a different output file.

First it parses the command line arguments. Then it creates the input reader. Then it creates the YAML Input object, which is used to read the *.yaml file in, and parse it. It is in the parsing step that I am having an error. Supposing if parsing the input from the *.yaml file was successful, the data will get stored into the DocType myDoc;. Then it prints all the Person objects stored in that std::vector. An overloaded operator<<() prints each element. Then it creates the output writer, creates the YAML Output, and writes myDoc into the file output_file.yaml.

The goal of this program is to demonstrate reading and writing a *.yaml file with LLVM YAML I/O. It successfully writes the output file, but it cannot read the input file.

Now suppose that instead of filling myDoc with elements from the yin, I would be manually adding elements instead. So I activate the code that push_back each element, and I disable the code that reads the input from the yin into memory.

    DocType myDoc;
    ///*
    myDoc.push_back(Person("Tom", 8));
    myDoc.push_back(Person("Dan", 7));
    myDoc.push_back(Person("Ken"));
    //*/

    /* Reading input into the memory */
    /*
    yin >> myDoc;
    if (error_code errc = yin.error()) {
        errs() << "error parsing YAML input from file " << InputFile << '\n';
        errs() << errc.message() << '\n';
        return EXIT_FAILURE;
    } else {
        outs() << "parsing YAML input from file " << InputFile << '\n';
    }
    */

In that case, the programs works fine. The myDoc is initialized with those elements, and then it prints each element to the stdout. Then in creates the output writer, creates the YAML Output, and writes the myDoc into the output_file.yaml.

Here is what the output file looks like when it is written:

---
- name:            Tom
  hat-size:        8
- name:            Dan
  hat-size:        7
- name:            Ken
...

I copy the output file into the input file, for testing the input functionality of the program.

cp output_file.yaml input_file.yaml

Then I deactivate the code which manually fills the myDoc, and I activate the code which fills the myDoc from the yin.

    DocType myDoc;
    /*
    myDoc.push_back(Person("Tom", 8));
    myDoc.push_back(Person("Dan", 7));
    myDoc.push_back(Person("Ken"));
    */

    /* Reading input into the memory */
    ///*
    yin >> myDoc;
    if (error_code errc = yin.error()) {
        errs() << "error parsing YAML input from file " << InputFile << '\n';
        errs() << errc.message() << '\n';
        return EXIT_FAILURE;
    } else {
        outs() << "parsing YAML input from file " << InputFile << '\n';
    }
    //*/

After that the code no longer works. If I try to provide that same input_file.yaml to the application, I get a bug. LLVM YAML I/O fails to parse the *.yaml file and prints an error! It's weird because this is the exact format that this same LLVM YAML I/O was outputting into the file.

./yaml_project --input-file=input_file2.yaml --output-file=output_file.yaml
opening input file input_file2.yaml
reading input file input_file2.yaml
input_file2.yaml:1:1: error: not a sequence
-
^
error parsing YAML input from file input_file2.yaml
Invalid argument

I cannot find why is it refusing to accept well formatted YAML code from the input file. If anyone knows how to fix this bug, please help me.

Here is the full listing of my code:

#include "llvm/Support/raw_ostream.h"
#include "llvm/Support/YAMLTraits.h"
#include "llvm/Support/YAMLParser.h"
#include "llvm/Support/ErrorOr.h"
#include "llvm/Support/MemoryBuffer.h"
#include "llvm/Support/CommandLine.h"

#include <cstdlib>       /* for EXIT_FAILURE */
#include <string>        /* for std::string */
#include <vector>        /* for std::vector */
#include <system_error>  /* for std::error_code */

using std::string;
using std::vector;
using std::error_code;

using llvm::outs;
using llvm::errs;
using llvm::raw_ostream;
using llvm::raw_fd_ostream;
using llvm::ErrorOr;
using llvm::MemoryBuffer;

using llvm::yaml::ScalarEnumerationTraits;
using llvm::yaml::MappingTraits;
using llvm::yaml::IO;
using llvm::yaml::Input;
using llvm::yaml::Output;

using llvm::cl::opt;
using llvm::cl::desc;
using llvm::cl::ValueRequired;
using llvm::cl::OptionCategory;
using llvm::cl::ParseCommandLineOptions;

/* Command line options description: */

// Apply a custom category to all command-line options so that they are the
// only ones displayed.
// The category tells the CommonOptionsParser how to parse the argc and argv.
OptionCategory yamlCategory("yaml_project options");

opt<string> InputFile("input-file", desc("The input YAML file"), ValueRequired);
opt<string> OutputFile("output-file", desc("The output YAML file"), ValueRequired);

struct Person {
    string name;
    int hatSize;

    Person(string name = "", int hatSize = 0)
     : name(name), hatSize(hatSize) {}
};

raw_ostream& operator<<(raw_ostream& os, const Person& person) {
    os << "{ " << person.name;
    if (person.hatSize)
        os << " , " << person.hatSize;
    os << " }";
    return os;
}

template <>
struct MappingTraits<Person> {
    static void mapping(IO& io, Person& info) {
        io.mapRequired("name", info.name);
        io.mapOptional("hat-size", info.hatSize, 0);
    }
};

typedef vector<Person> DocType;

LLVM_YAML_IS_SEQUENCE_VECTOR(Person)

int main(int argc, const char **argv) {
    /* Command line parsing: */
    ParseCommandLineOptions(argc, argv);
    if (InputFile.empty()) {
        errs() << "No input file specified\n";
        return EXIT_FAILURE;
    }
    if (OutputFile.empty()) {
        errs() << "No output file specified\n";
        return EXIT_FAILURE;
    }

    /* Create the input reader */
    auto reader = MemoryBuffer::getFile(InputFile, true);
    if (error_code errc = reader.getError()) {
        errs() << "error opening input file " << InputFile << '\n';
        errs() << errc.message() << '\n';
        // MemoryBuffer does not need to be closed
        return EXIT_FAILURE;
    } else {
        outs() << "opening input file " << InputFile << '\n';
    }

    /* Create the YAML Input */
    // dereference once to strip away the llvm::ErrorOr
    // dereference twice to strip away the std::unique_ptr
    Input yin(**reader);
    if (error_code errc = yin.error()) {
        errs() << "error reading input file " << InputFile << '\n';
        outs() << errc.message() << '\n';
        // MemoryBuffer does not need to be closed
        return EXIT_FAILURE;
    } else {
        outs() << "reading input file " << InputFile << '\n';
    }

    DocType myDoc;
    /*
    myDoc.push_back(Person("Tom", 8));
    myDoc.push_back(Person("Dan", 7));
    myDoc.push_back(Person("Ken"));
    */

    /* Reading input into the memory */
    ///*
    yin >> myDoc;
    if (error_code errc = yin.error()) {
        errs() << "error parsing YAML input from file " << InputFile << '\n';
        errs() << errc.message() << '\n';
        return EXIT_FAILURE;
    } else {
        outs() << "parsing YAML input from file " << InputFile << '\n';
    }
    //*/

    for (const Person& element : myDoc)
        outs() << element << '\n';

    /* Create the output writer */
    error_code errc;
    raw_fd_ostream writer(OutputFile, errc);
    if (errc) {
        errs() << "error opening output file " << OutputFile << '\n';
        errs() << errc.message() << '\n';
        writer.close();
        return EXIT_FAILURE;
    } else {
        outs() << "opening output file " << OutputFile << '\n';
    }
    /* Create the YAML Output */
    Output yout(writer);

    /* Writing output into file */
    yout << myDoc;
    outs() << "writing YAML output into file " << OutputFile << '\n';

    writer.close();

    return EXIT_SUCCESS;
}

Galaxy
  • 2,363
  • 2
  • 25
  • 59
  • Amir Kirsh Can you help me with this problem? – Galaxy Apr 29 '21 at 20:48
  • It seems strange that the error occurs at the first line of input which contains `---` and should be parsed as *directives end marker* (which starts the document content), not as some YAML entity that may provoke an error message like *not a sequence*. What happens if you delete the first line? – flyx Apr 29 '21 at 23:49
  • Another possibility could be that the YAML parser doesn't support CR LF endings, it would [not be the first one](https://github.com/go-yaml/yaml/issues/450). You can also try and convert the line endings to LF and see if that helps. – flyx Apr 29 '21 at 23:53
  • @flyx If I delete the first line, the same error occurs on the next line, which is now the first line, and first character. If I convert the line endings to LF using `dos2unix` the same error occurs. – Galaxy Apr 30 '21 at 02:30
  • The documentation [shows](https://www.llvm.org/docs/YamlIO.html#id3) that an error would print the whole line. Your error doesn’t so it seems to be an error in handling the input API so that you don’t get the complete line. I don’t really know the API but the examples seem to want you to use `reader.get()->getBuffer()` rather than `**reader` so maybe try that. Also you can put the file as string literal there to check whether it's actually an input error. – flyx Apr 30 '21 at 07:14
  • @flyx `Input yin(reader.get()->getBuffer());` also produces the same error. – Galaxy Apr 30 '21 at 07:27
  • Then try giving the input as string directly to check whether it's an input problem. – flyx Apr 30 '21 at 08:02
  • Hi again Galaxy. When something doesn't work the best thing is to try a simpler example, to help you locate the problem. Did you try to serialize a _single_ Person to yaml and deserialize it? Does it work? If not, you can narrow down the question to the simpler case. If it does work you can add this info to the question, pointing better to the actual problem. – Amir Kirsh Apr 30 '21 at 11:08
  • @AmirKirsh Thank you for your suggestion dear sir. So writing a *single* `Person` into the output file works fine. But reading a *single* `Person` from the input file into the memory also fails. It is the same problem, except it is `input_file1.yaml:1:1: error: not a mapping`, and at the first character of the file. – Galaxy Apr 30 '21 at 20:31

1 Answers1

1

The problem is in the line

auto reader = MemoryBuffer::getFile(InputFile, true);

with llvm-12 change the line to

auto reader = MemoryBuffer::getFile(InputFile);

it should work.