All measures of diet, physical activity and anthropometry contain some degree of measurement error. Understanding the sources of this error is vital to help refine and develop the method, and strengthen the inferences that are made from the data. It is also important to understand that:
The sources of measurement error fall into two categories:
These two types often co-exist. Scrutinising the sources and magnitude of both types of error is an important part of the research process, including deciding whether or not a method is appropriate for answering a particular research question.
Effect of random error on estimated values
For example, a calibrated set of digital weighing scales may not show exactly the same reading on each measurement due to small fluctuations in posture and instrumentation. Measured values will be dispersed in a normal distribution around the true weight (Figure C.4.1). Methods with high random error will be more dispersed, methods with low random error less so. For both methods, with repeated measurement the true value can be estimated as the mean of the values from each. Estimates based on the mean from both methods may be similar; however we can be more certain about estimates given by the method with smaller random error.
Figure C.4.1 The normal distribution of measured values due to random errors. Note that the mean of the measured values approximates the true population mean.
Sources of random error
Random error can occur for a variety of reasons. These include:
Examples of random error in diet, physical activity and anthropometric measurement include, but are not limited to, errors resulting from:
If the error can be directly linked to specific factors, the error is not random, but systematic. For example, errors caused by inadequate recall may differ by specific personal traits, such as age – this would be a systematic error (see section below).
Controlling random error
Random error generally affects a method’s reliability. Random errors may be reduced or compensated for by:
Technical error of measurement (TEM)
Measurement error is caused by the interaction between the instrument, the technique, the observer taking the measurement and the characteristics of the participant.
The technical error of measurement (TEM) is an accuracy index. In anthropometry for example, it is the most common way to express error margin. It is the standard deviation between repeated measures and it is a measure of error variability that carries the same measurement unit as the variable measured (e.g. cm for waist circumference). This is known as the absolute TEM. TEM can also be expressed as a percentage (known as relative TEM).
TEM = √ (∑D²) / 2N
D is the difference between measurements
N the number of volunteers measured
TEM is used for the calculation of the intra-observer error – which is the variation of the repeated measurements of the same participant (or a group of participants) measured by the same observer; and for the calculation of between/inter-observer error – which is the variation of measurements performed by different observers on the same participant.
These calculations are typically carried out for quality control purposes during data collection and their evaluation is crucial for the observer to be able to improve the performance of the procedures through a training process. It is necessary for TEM to be reported in manuscripts on studies with repeated measures to analyse whether the observed differences are superior to TEM. Otherwise, the observed differences may not be attributed to exposure/intervention but to the TEM.
The lower the TEM obtained, the better is the precision of the observer in performing the measurements. Typically, the expected TEM values for e.g. height that we observe in professional teams are between 0-0.2%; and for waist 1-2%.
Its interpretation is that the differences between repeated measurements will be within ±2 the value of TEM two thirds of the time.
TEM values for observers are considered adequate if they are within ±2 times the expert TEM (when 95% of the differences between replicate measurements are within ±2 TEM).
Figure 1.?. Example of Intra-observer error calculation for the waist measurement based on 3 observers and 10 participants
Figure 2.?. Example of Inter-observer error calculation for the waist measurement based on 3 observers and 10 participants
Effect of systematic error on predicted values
Systematic error causes deviation away from the true value in a particular direction (i.e. higher or lower, as shown by C.4.2). Unlike random error (see section above), systematic error distorts the mean and median of the estimated values; it is therefore a measure of the absolute validity of a method.
Figure C.4.2 The distribution of measured values distorted due to systematic error. Note, that the mean of the estimated values deviates from the true population mean.
The validity page describes systematic error in the context of taking height measurements of participants wearing shoes. This causes deviation from the true body height in an upwards direction. While random error can be compensated for by increasing the number of measurements for an individual or the sample size, this systematic error would persist to the same extent, no matter how many replicate measurements were taken.
Systematic error does not always result in bias
If physical activity measurements were performed only in summer, individuals’ levels of physical activity over a year are likely to be overestimated because of being more physically active in summer than in winter. This is a systematic error due to insufficient sampling across seasons.
However, if the degree of the overestimation is consistent across individuals, the ranking of individuals by physical activity level is unbiased and even associations with other characteristics would be unbiased. On the other hand, if the overestimation was proportional, say 10% higher in the summer, this would lead to slightly different associations since the most active individuals would have higher absolute systematic error than less active individuals.
If researchers are primarily interested in relative levels, rather than estimating absolute levels of a key variable, systematic error is not always problematic. Ideally this would be tested in a subset of a study to confirm that systematic error does not distort the ranking of individuals.
In research on population health, assessments of diet, physical activity and anthropometry are implemented in different population subgroups depending on the research question. These subgroups may be defined by: sex, age, socioeconomic status, lifestyle behaviours, risk factors for a disease, disease status, and the degrees of systematic and random errors may vary by those characteristics.
If error varies by one or more of those characteristics, the error is differential; and if not, non-differential. For example:
Wording can vary in the literature; terms such as homogeneity (non-differential) or heterogeneity (differential) in errors by subgroups are often used.
Different types of errors are visualised in Figure C.4.3, which illustrates:
Figure C.4.3 Display of random, systematic, differential, and non-differential errors with respect to men and women.
Bias is the term used to describe the cumulative effect of deviation of observations from truth which investigators wish to measure. So, in the example above the poor calibration of the weighing scales was just one form of bias that distorted the weight data consistently upwards.
There are at least 50 forms of bias that occur in population health sciences and several different systems for classifying and subcategorising them. One of the simplest systems is to separate them into:
Further information regarding the categories and subcategories of bias can be found in the following article 
Information bias is a result of errors introduced during data collection and is therefore the form of bias most applicable to the issue of measurement. These errors originate from:
These forms of bias can result from different forms of errors (e.g., mixture of differential and non-differential systematic and random errors)
Observer and respondent bias can be minimised by undertaking many different approaches: Training observers, requesting participants not to change lifestyle during a study or behave differently when being monitored, and others as described above (see maximising reliability).
In a randomised controlled trial, participants and observers are ideally blinded to the intervention so that they do not behave differently. The same principles apply for measurement as administration of assessment is also a form of intervention.
Selection bias occurs when the study sample is systematically unrepresentative of the target population about which conclusions are to be drawn, resulting in data with insufficient external validity. The resulting data may therefore be flawed in its ability to answer any research questions about the target population, even if the study itself has a high level of internal validity. Selection bias can occur because of:
Selection bias can often be reduced through careful study design, ensuring that no element of recruitment or data collection systematically favours one type of individual over another, as well as actively monitoring the characteristics of sample population during the course of the study.
If selection bias is likely to exist by design, it is helpful to collect other information which makes it possible to estimate the likely size and direction of the bias in the results. For instance, it is possible to collect the demographic characteristics of non-responders, who can then be compared with the responders to determine whether they differ systematically by social status, education etc.
Confounding does not result from measurement error and is not a measurement issue per se, but is an important epidemiological concept relating to study design, analysis and interpretation of data. Although confounding does not directly generally affect choice of method of assessment directly, it is important to ensure that potential confounding variables are measured using reliable and valid methods in order to reduce the effect of residual confounding. There are three criteria for confounding:
In typical epidemiological research on a lifestyle-disease association, confounding is explained by the relationships depicted in Figure C.4.4
Figure C.4.4 Example of relationship between exposure, outcome and confounder.
In the above example, one of the assumptions is that smoking status is not on the causal pathway of the coffee-cancer relationship or, in other words, coffee consumption does not change smoking status. Confounding is controlled in the design of a study by randomisation, restriction or matching of participants, and during the analysis phase by stratification or statistical modelling.