0

I have a set of audio files (DSDIFF, specification) to which somebody appended an ID3v2 tag. This does not conform to the file's standard, and thus standard ID3v2 parsers (like TagLib) don't recognize the audio file and refuse to parse it. (Why doing non-standard stuff like this seemed like a good idea is beyond me.)

I can manually parse the file and extract the raw ID3 tag (as a char* + size); however, I'm not sure how to proceed from here and get the values of the individual frames inside the raw tag.

I would like to use TagLib to parse the char*, but I have never used the library before. I'm also okay with using other libraries. I would not like to write my own parser from scratch.

Here is what I have tried so far:

Attempt 1

auto file_name = std::filesystem::path("location/of/audio.dff");
auto file_stream = std::fstream(file_name);

// ... parsing the DSDIFF section of the file
// until I encounter the "ID3 " chunk.

auto id3_start = file_stream.tellg();
TagLib::FileRef taglib_file(file_name.string().cstr());

// when executing, taglib_file.isNull() evaluates to true
if (taglib_file.isNull()) 
    std::cerr << "TagLib can't read the file." << std::endl;

auto tag = TagLib::ID3v2::Tag(taglib_file.file(), id3_start);
// ... handle the tag

This approach doesn't work, because TagLib doesn't know how to parse the DSDIFF format. As a result, taglib_file is a NULL pointer and no tags are read.

Attempt 2

auto file_name = std::filesystem::path("location/of/audio.dff");
auto file_stream = std::fstream(file_name);

// ... parsing the DSDIFF section of the file
// until I encounter "ID3 ".
// read the size of the tag and store it in `size`

char* id3tag = new char[size];
file_stream.read(buff, size);

// How to parse the id3tag here?

paddy suggested using

auto tag = TagLib::ID3v2::Tag().parse(TagLib::ByteVector(buff, size));

unfortunately .parse is a protected method of Tag. I tried inheriting and creating a thin wrapper from_buffer that internally calls parse, but that didn't work either.

Suggestions are highly appreciated :)


I am aware if a similar question: Taglib read ID3v2 tags from arbitrary file c++

However, the answer there was "just use the specific parser for your file type". In my case, this parser does not exist, because the file type doesn't actually support ID3 tags; somebody just appended them anyway.

FirefoxMetzger
  • 2,880
  • 1
  • 18
  • 32
  • Been years since I dealt with TagLib, but I do vaguely recall that there are rules about 16-bit padding boundaries, especially in the last byte of the file. Often, lazy implementors forget to pad odd-length files, and a simple solution is just to add an extra null byte to the file. Your question might be a better fit if you explained the actual _problem_. If you don't know why these files are breaking the parser, it's harder to find a solution. – paddy Nov 27 '20 at 09:14
  • My guess is that you actually should just read all the data into a vector, and use the `Tag::parse` method instead of trying to make it read the file. – paddy Nov 27 '20 at 09:18
  • @paddy I will try to create a code example; not sure how to add an example file since each file is around 200-300MB. Likely, the problem is me not understanding TagLib properly. The way it manifests itself is that when I call the factory `TagLib::FileRef(file_path)` it attempts to open the file, doesn't find a supported format (DSDIFF is not natively supported by taglib), and returns `NULL`. I will check `Tag::parse`. – FirefoxMetzger Nov 27 '20 at 09:45

2 Answers2

1

If you're able to extract the ID3 data already then put it into a/make it an IOStream which you can then hand over to TagLib's File constructor. Or store it in a separate (temporary) file to let TagLib access it then. I've never used it myself, but I'd be surprised should it need MPEG data to recognize any ID3 tags (after all those are always at the start or end of a file anyway, without needing to parse any audio data).

The sample file you want to provide should be tiny - none of us needs the audio data, only the ID3 data (and maybe 50 byte in front and after it). And even then the ID3v2 content is irrelevant anyway - its first 30 bytes should be enough for a demonstation - and those could be printed as easily as this: \x49\x44\x33\x03...

AmigoJack
  • 5,234
  • 1
  • 15
  • 31
0

AmigoJack got me on the right track to figuring out a solution. While TagLib::File is an abstract class and can't be instantiated directly, one of the existing file-format specific parsers can be used. It is possible to create a file that only contains the ID3 tag and use the MPEG parser to read it.

Here is the relevant code snippet:

// ... parse the DSDIFF file and convert the data into FLAC format
// until ID3 tag

const struct {
    int length; // number of bytes in ID3 tag
    char* data; // filled while parsing the DSDIFF file
} *id3_data;

const auto data = TagLib::ByteVector(id3_data->data, id3_data->length);
auto stream = TagLib::ByteVectorStream(data);
auto file = TagLib::MPEG::File(&stream, TagLib::ID3v2::FrameFactory::instance());

/* copy all supported tags to the FLAC file*/
flac_tags.tag()->setProperties(file.tag()->properties());
FirefoxMetzger
  • 2,880
  • 1
  • 18
  • 32
  • "Appending" ID3v2 to MPEG audio can't be correct - the metadata should **pre**pend the audio data. Have you tried just providing the ID3v2 data without one bit of audio? – AmigoJack Jan 13 '21 at 11:45
  • @AmigoJack Yes, it is indeed possible. I tried it briefly before but apparently had some other bug in my code that broke things. I tried it again now, and it appears to be copying the tags just fine without the minimal MP3 added. I've updated the answer. – FirefoxMetzger Jan 14 '21 at 11:08