Common Data Errors

January 6, 2019

We live in an increasingly data centric world and we are consuming data at a rate, like never before in history. Yet, our understanding of the data we generate does not always take advantage of its true potential. To do this, data needs to be standardised, readable and easily accessible.

Within the Ground Investigation (GI) industry, we are fortunate to have a data format which satisfies these criteria, and available as the AGS (Association of Geotechnical and Geoenvironmental Specialists) data format. Implemented in 1992, the AGS data format has been used to consolidate information generated by laboratories, engineers, drillers, technicians, designers and geologists into one single data file system.

Today the AGS data format is the standard not just in the UK but in America, Australia, Singapore and many other countries around the world. This is testament to the dedicated volunteers from across the industry who have worked hard to promote the format, and also to the ease of use of the format and its accessibility. The primary purpose of the AGS data format is to transfer information from one geotechnical system to another. The most frequently used ground investigation systems are data management programs which translate GI data to produce logs, 3D models and analyse geotechnical and geochemical data. Each data management system has its own pros and cons but, whichever one you may be familiar with, it should always automatically check the AGS data before importing and exporting data.

There are still however, some occasions when the format or the data goes wrong and where the problem lies may not be immediately obvious. This article should help you understand some of the most commonly encountered errors you may come across and provide you with a better understanding of the AGS data format and what you should be watching out for. If you or your company are a member of the AGS, you should check out the AGS data format website (links provided) where you can download examples of AGS data, suggest codes and changes, and get a list of what the codes stand for. The types of errors encountered can be divided into two categories, data errors and data format errors.

Data Errors

The AGS is designed as a transfer medium from one system to another. To validate the structure of an AGS file AGS checkers are available (detailed on the AGS data format website) however, though these may ensure an AGS file is structurally correct, the data it contains could be complete gobbledegook. This is where it is the responsibility of the data managers, engineers, the drillers, laboratories and technicians to do everything they can to prevent bad data being included in their datasets.

The types of error you can expect are too numerous to fit into one article, but we can use some examples. These errors may be as simple as spelling a colour incorrectly, or serious enough that the incorrect placement of a decimal point for in-situ tests may jeopardise an entire project, leading to unnecessary costs and programme delays. Samples greater than the depth of the hole, core recoveries of 1000%, boreholes plotted somewhere in space around the north pole and holes drilled in the year 2119 are all examples of situations that (unless you???re reading this 100 years from publication) are unlikely to happen, yet it is these kind of errors which we are most likely to encounter.

As humans, we’re all susceptible to making these kinds of mistakes, either through an accidental key stroke, misinterpretation of handwriting or simply under pressure from time constraints. Technology can help us minimise these errors as well as save time and paperwork by reducing the double handling (rewriting) of data. Software already exists to aid with primary data collection but expect to see a seismic shift in the coming years when it comes to moving to a digital system over paper techniques. The BDA has previously stated the benefits of switching some tasks to digital systems and development continues into other roles.

How many of us are guilty of “as above” or “see previous” on logsheets when it comes to monotonous information such as serial numbers, dates, staff, units, methods etc? This seemingly redundant information is known as metadata and is invaluable not just when errors occur (by tracing data back to its origin) but also in allowing statistics to be generated quickly and accurately. It also substantially increases confidence in the data generated as it removes any ambiguity about specifics. Yet, many metadata fields in AGS files are left empty. Digital systems will also help collect more data than ever before by automatically filling out these data fields.

The software you use to manage your AGS data will determine how data is validated. If you are unsure of what validation protocols are available, then contact the software developer for tips and advice on how stricter controls can look for these types of data errors.

Now we know what type of primary data errors to look for we can look at some of the errors that can occur with the AGS data format.

Understanding the Data Format

To understand format errors, we need to look at how the data is presented and how it is structured.

The AGS version 4.0 data format is written in plain text, readable using any text editor and compatible across all operating systems. Data is separated into groups which are represented by 4 letter codes, signifying the table the data is to be stored in i.e. SAMP for Samples, LOCA for Locations and so on. Next, there are the identifiers HEADING, UNIT and TYPE. These indicate what group field the data should be stored in, whether they have a unit associated with them (metres, degrees etc) and what format they should be stored in (text, decimal places, from a list of values etc) respectively. The main chunk of the data follows on after the identifiers.

All the data is separated using a comma (,) and encased in double quotes (“DATA”)

Data Format Errors

Now we have a simple understanding of the structure we can begin to investigate some of the data format errors.

Code missing from the abbreviation (ABBR), type (TYPE), unit (UNIT) and dictionary (DICT) groups. The ABBR, TYPE and UNIT groups are a summary of all codes, units and data types used within the AGS file and the DICT group defines custom groups and fields outside of the AGS standard. If a code or group is used in the data but not correctly assigned in the ABBR, UNIT, TYPE and DICT groups, then this will result in an error. This error is particularly common with laboratories and Cone Penetration Testing (CPT) where these industries develop innovative methods for testing and use custom codes and tables not yet approved by the AGS. This can also mean that techniques for testing are measured with greater precision. Resulting in the data presented in the file not matching the format stipulated in the group identifiers or the properties in the data management software (if a conversion is not applied automatically on import/export).
The amount of data fields presented does not exactly match the number of fields defined in the heading, unit and type. This can be caused by the addition or deletion of a comma, double quote or other special character which will result in a malformation of the data file. The data management software will be unable to discern exactly what field relates to the data group and throw an error.
Groups require key fields to be populated and these key fields must exactly match data with the associated group, if the associated key field is missing then an error is encountered. For example; you cannot have a geotechnical result without the sample information populated, and you cannot have a sample without the location information group populated.
Errors can occur where the versions of AGS differ when identifier fields have been altered to allow for more detailed information. This is particularly noticeable when converting geochemical data from AGS 3.1 to AGS 4.0.

Authors:

Ben Swallow – Member of BDA Technical Standards Sub-Committee

Paul Hadlum – Data Manager at WYG

Join the BDA today

Keeping you in the loop