-1

I'm writing Python code to read and write GEDCOM files, which are text files used to share data among genealogy data programs. The GEDCOM specs say that a line can't exceed 255 characters. I will be importing the data into SQLite which has no restriction on line length. Very technical info on character sets and such is beyond my interest, I just need to get a little insight as to whether I can or cannot ignore the limit. I'm guessing that importing to SQLite, I would just concatenate the broken lines from the GEDCOM file, I don't see how the limit would effect importing to SQLite.

I foresee the need for a little more insight (not a PhD on the topic) so I will know why/if I need to break lines up when writing a GEDCOM file to export data to some unknown genealogy program. The GEDCOM specs just say what the limit is, they don't say why. When I played with Postgresql some years ago I think they had types like VARCHAR(255) so does that mean GEDCOM is just playing it safe in case some program can't store longer strings? I can't imagine the big SQL programs not allowing longer strings if SQLite allows them. I see 255 and 256 all over the place but it's all technical information that I don't think I need, so if I could get a little guide on what limited topic I should even be studying, that would be helpful.

If my above conjecturing is either right or wrong, that would also be helpful to know. Thanks.

EDIT: I'm interested in replacing GEDCOM with something better but I don't know why a GEDCOM replacement should limit line length or not, because I don't know why GEDCOM does it.

Luther
  • 514
  • 4
  • 17

1 Answers1

3

I'm not sure it really matters why. If the GEDCOM spec has a limit, you need to honor that limit when writing to a GEDCOM file.

Otherwise, you'll find genealogy programs that cannot read the files you create, and it'll be pretty clear where the fault lies in that case :-)

Especially since the latest specification has this requirement for readers:

Must not accept lines longer than 255 code units, must reject the file as invalid, with an error message such as "Line is too long. This is not a GEDCOM 5.5.5 file.".

That means compliant reader are required to reject files where the lines are too long. What you do in your SQL database is totally up to you since it's likely it'll only be your code processing it. But, if you want to create GEDCOM files readable by all applications, you should follow the spec.

As an aside, you can actually cater for logical lines greater than that physical line limit, simply by using CONC (concatenation) records. A related CONT (continuation) record is available to insert newlines into logical lines, since physical lines are not permitted to contain those characters.

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953