Gedcom & Genealogy Programs

What is Gedcom ?

Gedcom is a protocol or set of rules for the exchange of genealogical data. It was originated by the LDS Church who are into record gathering in a big way. It defines a record structure and fields used to store and move data and it does so using what can be thought of as the lowest common denominator - a simple text file. LDS developed the protocol for submitting data for the LDS Temple and Ancestral File but it has become a de-facto standard for exporting and importing data between genealogical computer programs. In an introduction to the Gedcom Standard documentation they say :

GEDCOM was developed by the Family History Department of The Church of Jesus Christ of Latter-day Saints (LDS Church) to provide a flexible, uniform format for exchanging computerized genealogical data. GEDCOM is an acronym for GE nealogical D ata COM unication. Its purpose is to foster the sharing of genealogical information and the development of a wide range of inter-operable software products to assist genealogists, historians, and other researchers.

Most genealogical data describes people in terms of relationships and events e.g John Smith was born on 18 June 1850, his parents were William Smith and Anne Hindle. In fact a Genealogy program doesn't "think" like that - it will record three people and a family. It stores the information about the people and then connects them to the family, so John would be connected to the family as a child, William would be connected as a husband, Anne as a wife. There will be a host of facts associated with the individuals and they need their pre-defined classifications. It is these parcels of information that Gedcom attempts to replicate so that the data can be turned into a text file and exported in a form that can be imported to another system.

To do this it defines a number of "tags" for the data. As an example the following fragment from a Gedcom file describes John Smith :

0 @I1@ INDI
1 NAM E John /SMITH/
    2 SURN SMITH
    2 GIVN John
1 SEX M
1 BIRT
    2 DATE 18 Jun 1850
    2 PLAC Rishton
1 CHR
    2 DATE 21 Jun 1850
    2 PLAC Great Harwood
1 DEAT
    2 DATE 12 Oct 1910
    2 PLAC Blackburn

I have indented some parts to make it clearer which date applies to which event and highlighted the tags . I think this is reasonably self-explanatory, it gives John's name, gender and his dates and places of Birth, Christening and Death.

This data can then be transported to another system and imported.

So what's the problem ?

The problem is that Genealogy programs aren't written around Gedcom, that bit comes later ! Different programs will major on different aspects and might not have made any provision for some events. They might also have used the Gedcom specification in a different (but legal) way, there is often more than one way to represent the same information. Taking John Smith, the record above was created using PAF5 (Personal Ancestral File Ver 5 - a program from LDS), I then exported it to Genopro, the program I normally use and exported it again and this is what I got :

0 @IND00001@ INDI
1 NAME John /Smith/
1 SEX M
1 BIRT
    2 DATE 18 JUN 1850
1 DEAT
    2 DATE 12 OCT 1910

I have lost all information about Christening and the places of Birth & Death. As a user of Genopro this shouldn't surprise me as there is nowhere in that program for me to enter Christening data or place data for Birth and Death. His name is still there but isn't split into SURName and GIVeN name although in Genopro I do enter Surname and Given name separately; it "knows" that the bit between the // is the Surname.

This explains the problem at its simplest level - if you are exchanging data between two different problems you might lose information. In some cases an importing program might create an error log or output warnings but how many people are going to work their way through all of them, some pretty obscure, to see what they are missing.

As an example I recently received a Gedcom and when I imported it to FTM2006 (Family Tree Maker) I got 983 lines like this :

WARNING:  line 3067: RIN 227: Name must have 0 or 2 slashes: 'Radulph/Raphe /SMITH/'.
ERROR 2:  line 7876: RIN 576: Unexpected tag 'TEXT' in Citation Structure.
      3  TEXT together with John and Mary Ann
WARNING:  line 10230: RIN 762: Name must have 0 or 2 slashes: 'Henry/Harry /SMITH/'.

The warnings are, in fact just that - I haven't lost any data it just isn't in an approved form. The / character within a name field is used to separate the Surname. The originator of this data was unsure about the given names so entered the alternatives separated by a / . Result - confusion for the computer, it doesn't know if in the first case the Surname is Raphe or SMITH (they are both enclosed by /s ). The error, on the other hand means I have lost some information, by going to line 7876 of the Gedcom file I might be able to work out what was meant and then enter it by hand in my program, after locating the individual involved.

An extract from the standard on Personal Names reads :

The surname of an individual, if known, is enclosed between two slash (/) characters. The order of the name parts should be the order that the person would, by custom of their culture, have used when giving it to a recorder. If part of name is illegible, that part is indicated by an ellipsis (...). Capitalize the name of a person or place in the conventional manner - capitalize the first letter of each part and lowercase the other letters, unless conventional usage is otherwise. For example: McMurray.

Examples :
William Lee (given name only or surname not known)
/Parry/ (surname only)
William Lee /Parry/
William Lee /Mac Parry/ (both parts (Mac and Parry) are surname parts
William /Lee/ Parry (surname imbedded in the name string)
William Lee /Pa.../

Dates

Dates represent a special challenge. Gedcom is only interested in recording them, in fact it makes extensive provision for stuff like the Hebrew, French Revolutionary, Roman, Julian and Gregorian Calendars (but not the Islamic or Chinese) and allows for special forms for approximate dates. So you could have a person whose birth was recorded in the Hebrew form but his death was recorded by means of the French revolutionary calendar. I don't, however, think there is a program that would work out how old he was when he died (actually I doubt if there is a program that would enable you to enter that sort of information, but that doesn't concern people who live in ivory towers and write standards). The point is that some programs will let you enter approximate dates such as ABT 1820, or BETween 1795 AND 1804 , others won't . They are the ones that are going to calculate something and can't work with wooly dates.

Some time since I  received a Gedcom which included a DATe of Burnley, Lancs. How that happened I have no idea but it looks like the originating program had a very flexible approach to what constitutes a DATe.

So what can I do about it ?

0 @IND00001@ INDI
1 NAME John /Smith/
1 SEX M
1 BIRT
    2 DATE 18 JUN 1850
1 DEAT
    2 DATE 12 OCT 1910
    2 NOTE Died at Blackburn
1 NOTE Born in Rishton
    2 CONT Christened - Gt Harwood, 21 June 1850

There is a place for NOTEs associated with DEATh, but not with BIRTh so the Place of Birth gets stuck in a general NOTE associated with the INDIvidual and this note is CONTinued on a second line to give the Christening data. I'm pretty sure that all the info will show up somewhere in most other systems, but probably not where the person using that program expects to see it.

Remember that you will suffer (but may not see) the problems associated with imported data but others will be on the receiving end of data you export and may think your data is corrupt when it's just program incompatibilities. Try to be aware of incompatibilities if you frequently exchange data with someone and if all else fails be prepared to re-enter some data by hand

After all I have said about Genopro you will probably wonder why I didn't dump it long ago, well that's the point - Gedcoms are only a small part of what I do with it . I like the visual interface and its ability to generate my websites, for me there are more swings than roundabouts.   Even upgrading is fraught with problems - I had a look and decided that the amount of notes that really should be moved into proper fields that are in the new version is too big a task to handle, and that's without using Gedcom to shift the data ! I have used Genopro as my example because it is the one I use and I'm familiar with it. Other programs have similar, but different, problems. The important thing is to be aware of the issues and that there isn't a magic wand that will solve them.


"Fuzzy" dates

These are standard Gedcom abbreviations used for "Fuzzy" or imprecise dates,
Approximate Dates :  
ABT <DATE>  ABT =About, meaning the date is not exact.
CAL <DATE>  CAL =Calculated mathematically, for example, from an event date and age.
EST <DATE>  EST =Estimated based on an algorithm using some other event date.


Date Ranges :  
BEF <DATE> AFT =Event happened after the given date.
AFT <DATE> BEF =Event happened before the given date.
BET <DATE> AND <DATE> BET =Event happened some time between date 1 to date 2. For example, BET 1904 AND 1915 indicates that the event state (perhaps a single day) existed somewhere between 1904 and 1915 inclusive.
FROM <DATE> TO <DATE> FROM=Event happened continuously between date 1 and date 2. For example, FROM 1904 TO 1915 indicates that the event state started in 1904 and ended in 1915.


Bob Calvert
4 Nov 2011