The Balancing Act of Data Conventions

Man Balancing Data & Original Records

How does one accurately depict the events of an individual's life that transpired hundreds of years ago utilizing only computer data? How are we to record and manage thousands of names, places, and dates in a modern computing environment and still maintain its original character? Some accepted genealogical data conventions can assist us, but so many issues are left to the specific nuances of a region and its records. In creating a number of indexes to original baptismal, marriage, and death records, I have come to appreciate the difficulty of balancing a need for consistent data with a desire to retain the flavor of historical documents. Preserving original information while at the same time providing standardized access using modern database applications can be an archival nightmare!

Take for example the name "Joannes Prandstötter" and its more modern equivalent "Johann Brandstetter." When you enter these two names into a database, whether it is an application like Microsoft Access or an off-the-shelf genealogical software package, they will be sorted in such a way as to not even be close to one another. This occurrence introduces an increased risk of duplication and hinders identification of familial links. It is not so problematic when dealing with a few hundred names, particularly if you are familiar with this potential problem, but when working in databases with thousands of names, it presents a significant challenge. Does one impose a certain level of homogeneity on data to more ably use modern technology in documenting historical family units or does one sacrifice consistent access to preserve unique information?  I believe that the answer lies in the concept of an "index."

The primary purpose of an index is to provide a bridge between information desired by one individual and an original work created by another. An index increases access to targeted information by implementing a number of well-known techniques. It is not a transcription of the original work, but rather a mechanism to enhance access to it. Because it is not meant to represent or replace the original, an index may appropriately employ conventions or standards that modify data to enable it to more fully accomplish its purpose. Admittedly, this process can also introduce error and is only as reliable as the knowledge and skill of the indexer. It is a balancing act.

You will find articulated below the data conventions that I have tried to follow in creating and documenting information available through the "Genealogy" button in the link bar to the left. They include standards regarding given and surnames, locations, dates, and source documentation. If you have any recommendations concerning these conventions, please feel free to contact me. I have tried over the past decade to extract and create consistently accurate data for this project, but I realize that it is far too easy to introduce inconsistent and erroneous data over time. I am working tirelessly to increase the quantity and refine the quality of this information.

Given and Surname Conventions

Establishing a uniform naming strategy for people can be very challenging. As noted above, the ability to sort and link people by given and surnames is vital. For the purpose of improving access to information through automated indexing, diverse spellings of names represents the kiss of death! Given names and surnames must be spelled uniformly across centuries, multiple parishes, and diverse records in order to take advantage of the robust capabilities of database applications. This is controversial, but again, we are speaking of an index and not a transcription or reproduction of the original record. In order to approximate this goal, I have attempted to implement the following two conventions: 1) When it comes to given names, I have chosen to follow Ernest Thode's German-English Genealogical Dictionary (ISBN 0-8063-1342-0). His listing provides a preferred spelling of every conceivable Germanic given name; it also lists the Latin and English equivalents along with several other European and Scandinavian language renderings. It is the best resource I have found. 2) Surnames are a bit tougher and my method is a bit more random. Surnames tend to morph a great deal over time and from one priest to another. At times they seem to only loosely resemble each other in spelling, even when you know that they represent the same immediate family. I finally determined that I would have to homogenize all different spellings of each surname into their most common modern version. I accomplish this by comparing all known versions of a surname using the Austrian Herold online telephone search engine  ( This method provides the most common modern version of Waldvierteler surnames in almost every instance. The trick of course, is to be familiar with interchangeable letters and with how these names were commonly transformed. Are there nuances that might be lost using this method? Of course, but the advantages completely outweigh those nuances, which are all frankly retained in the original records. This is a conclusion that I have only come to in the past couple of years, so I still have multiple versions of several surnames scattered throughout my database. I am working to clean that up over time. I am also trying to be very careful not to combine similar yet distinctive surnames.

Location Naming Conventions

The objective of citing places in genealogy is to enable us to identify them with certainty in the modern world here and now. As a result, I have chosen to spell and organize locality information as it exists in today's Austria. For example, you will not find Bärensohl in the Southern Waldviertel on any modern map, but you will find it spelled Pernsoll. In order to link a family to a specific location that is readily identifiable on the ground today, it is essential for the indexer or family historian to bridge the gap between Bärensohl and Pernsoll. Using the modern spelling of places is also aided by utilizing the current governmental structure to identify it. In the USA we would use four levels of identification: 1) town, village or city; 2) county; 3) state; and 4) country. With some adaptation, this same structure works well in Austria: 1) Gemeinde, 2) Bezirk, 3) Bundesland, and 4) Land. As an example, you would use: Martinsberg, Zwettl, Lower Austria, Austria. The only variation that I have found essential is when identifying a smaller location (Ort) which falls within the jurisdiction of a Gemeinde, in which case I drop the Bezirk or county and list it accordingly: 1) Ort, 2) Gemeinde, 3) Bundesland, and 4) Land. Wiehalm, Martinsberg, Lower Austria, Austria would represent this configuration. In both instances, the location would be easily identifiable using modern maps. This system makes much more sense than linking an Ort or Gemeinde to the affiliated parish whose boundaries changed over time. Parish affiliation is also not as helpful when trying to locate a particular place on a map today. At any rate, the parish is always identified in the notes when citing church record sources. When a residence is known, but not the location of origin, marriage or death, I use the "of" convention: "of Roggenreith, Kirchschlag, Lower Austria, Austria" or "of Pöggstall Parish, Melk, Lower Austria, Austria."

Date Conventions

Date conventions require less explanation. I use the genealogical standard of day, month (abbreviated in three letters), and a four digit year (i.e. 16 Aug 1628). This format removes all speculation and reduces error. In those instances where specific dates have not been documented, I will often impose an "about" (abt) date so that individuals or couples with the same names may be segregated into broad periods of time. Thus, a Peter Ehrl born in abt 1743 is not likely to be the same Peter Ehrl who married in 1866. These are admittedly rough estimates, but a helpful tool none-the-less when sorting through thousands of names. Sometimes using an about date is more precise, such as when placing a child within an existing gap between siblings where no baptismal record has been identified or when it is based on a calculation (i.e. age at death or age when a parent died). "Before" (bef) and "after" (aft) dates are used when warranted based on evidence provided in various records. For example, a marriage entry may state that a certain parent died prior to the marriage. This knowledge may be helpful to isolate a smaller range of years to search for that parent's death records or subsequent marriages.

Source Documentation Conventions

I use a very basic form of citation to document each entry: 1) type and nature of record, 2) volume and page designations, 3) jurisdiction and location, and 4) repository. This provides sufficient information to locate the record and entry without being too burdensome. Here are two examples of this format:

Church Record: Baptism Book D, 1836-1856, p. 32, Martinsberg Parish, Lower Austria, Austria; St. Pölten Diocese Archives.

Civil Record: Inventursprotokoll, 30 Jun 1837, BG-Ottenschlag, 5. Patrimonialherrschaft Guttenbrunn/Martinsberg, Book 28 (1824-1837), p. 406-408; NÖLA, Bad Pirawarth, Austria.

Additional notes are added to supplement vital information where available and deemed helpful. Also, for church entries after 1770, I have included house numbers with these citations where they were provided.

Austrian colleagues at St. Pölten Catholic Diocese Archives

Return to Top