What is diet variation and why is it important?
A key objective of many studies in terms of dietary assessment is to estimate the habitual dietary intake i.e. the average long-term intake of particular foods or nutrients for the group or individuals under study . Habitual intake is exceedingly difficult to measure due to diet variation. The diet variation consists of:
Both the between- and within-individual diet variation should be considered during the selection of a dietary assessment method as well as in data analysis. Figure D.1.9 shows a hypothetical data for daily energy intake for five individuals across four days of dietary assessment. In the top figure (A), between-individual variation is large, as indicated by the spread of energy intakes across the five individuals (i.e. lines are relatively separated on the y-axis). On the other hand, within-individual variation is small, as indicated by stable energy intake across days (i.e. lines are relatively flat).
In the bottom figure (B), between-individual variation is small, as indicated by the lack of the spread of energy intakes across the five individuals (i.e. lines close together on the y-axis). In contrast, within-individual variation is large, as shown by the change in energy intake from day-to-day for each individual (i.e. lines are relatively uneven).
For the purpose of describing the prevalence of over- or under-nutrition in a population, understanding the diet variation is crucial. Bias in the estimation can occur without proper consideration of the within-individual diet variation. For example, if the within-individual variation is large and then between-individual variation is over-estimated, the estimate of the prevalence of over- or under-nutrition can be biased. Depending on the population average, the estimate of the prevalence can be either under-estimated or over-estimated (Figure D.1.10). The magnitude and direction of bias depends on the nutrient of interest, the method to assess the intake, and the target population.
Figure D.1.9 Data demonstrating between- and within-individual variation in daily energy intake for five individuals across four days of measurement. Upper: Higher between-individual variation and lower within-individual variation. Lower: Higher within-individual and lower between-individual variation.
Figure D.1.10 Hypothetical histograms of nutrient intakes in a population. Blue lines indicate the true variation. Red lines indicate that the variation is over-estimated. The vertical line (*) indicates a cut-off point to define nutrient deficiency. A: When the population average is higher than a cut-off point for nutrient deficiency and when between-individual variation is over-estimated, the prevalence of individuals with nutrient deficiency is over-estimated. B: When the population average is lower than a cut-off point for nutrient deficiency and when between-individual variation is over-estimated, the prevalence of individuals with nutrient deficiency is under-estimated.
Example using the coefficient of variation
Dietary variation can be expressed as the coefficient of variation (CV), a standardised measure of spread that describes the amount of variability relative to the mean. The within-individual and between-individual variation can be expressed using CV as follows:
CV for within-individual variation = [(√within-individual variance) / mean]*100
CV for between-individual variation = [(√between-individual variance) / mean]*100
For most nutrients, within-individual variation is greater than between-individual variation . This is illustrated in Table D.1.10 for a sub-group of Japanese women . For example, fat intake was on average 59.7g per day, but this varied on average by 35% for each individual from one day to the next, but only by 19.3% from one individual to another. Figure D.19 part B shown above also displays the scenario that within-individual variation is greater than between-individual variation.
The relationship between within- and between-individual variation has important implications for assessment and can be summarised as a ratio (for example see variance ratio, ‘VR’, in Table D.1.10). If the within-individual variation is small relative to between-individual variation (see Figure D.1.9 part A), then individuals can be more readily distinguished [2,3]. If, as is normally the case, the within-individual variation is large relative to between-individual variation, then distinguishing between individuals and subsequent ranking is more difficult.
This partly explains why the mean intake of a group is usually more readily assessed than the intake in an individual.
Table D.1.10 Within- and between-individual variation in mean daily energy and nutrient intake of Japanese women (n=58).
|Dietary fibre (g)||12.4||3.2||33.8||24.8||1.86|
CVw = within-individual coefficient variation
CVb = between-individual coefficient variation
VR = within-individual to between-individual variance ratio. A larger VR indicates within-individual variation is large compared to between-individual variation.
Adapted from: .
Factors influencing diet variation
Diet variation can be influenced by the following factors:
Accounting for diet variation
Both the within-individual and between-individual diet variation are natural phenomena. Therefore, regardless of the sample size of a study, we cannot reduce the diet variation. However, if we increase the sample sizes of individuals and repeats of dietary assessment, we can improve estimates (i.e. increase precision) of the within-individual variation, the between-individual variation, or both.
To estimate the within-individual variation and improve the estimates, we need to increase the sample size of repeats from the same participants. To do the same for the between-individual variation, we need to increase the sample size of study participants (i.e. include a greater number of participants). Of note, if we have a huge sample size of study participants, but dietary intakes are assessed only once, we cannot estimate the within-individual variation of the participants.
Selection of dietary assessment method
Time of the year
The greater the seasonal variation, the broader the reference period should be when referring to FFQ; or repeated FFQs should be undertaken to cover the usual diet over seasons. When referring to diet records or 24-hour recalls, the greater the seasonal variation, the more distributed the days across the year should be.
If a study plans to assess dietary change over years, investigators may want to collect repeated dietary data at the same time of the year to consistently account for seasonal variation. If the repeated measures were collected in different seasons, changes in dietary intakes would be confounded by the within-individual seasonal variation.
Number of days for multiple dietary records and 24-hour recalls
The factors which should be considered to decide the number of days include the following:
Day of the week
The biggest difference in dietary intake has often been assumed to be between weekdays and weekends. The assumption often made is that diets on Saturday and Sunday were similar. However, analysis from the UK National Diet and Nutrition Surveys (NDNS) has shown that these two days may vary greatly, suggesting that both days may need to be assessed [6,7].
In studies using interviewer-administered dietary assessment, attention must be paid to the days of the week. Interviewers who do not work weekends or respondents who do not want to be interviewed on weekends for instance may lead to a bias in the dietary data collected.
In the estimation of the habitual consumption of dietary intakes from multiple days of dietary records or 24-hour recall, a simple average can cause bias. Suppose dietary assessment is implemented on one day from the weekend and two days from weekdays. The average of three days over-represents the weekend because 33% of data come from weekends, rather than 28.6% (2 days out of 7 days). Unless a 7-day assessment is implemented, a weighted average should be calculated. In this example, the calculation should be (2 x weekend + 5/2 x (weekday 1 + weekday 2)) divided by 7.
In the analysis of 24-hour recall data, days of recording and days of diets should be carefully distinguished. For instance, dietary data collected on Monday represents recalled diet on a Sunday.
Why is portion size important?
The estimation of a portion size has been recognised for more than 50 years as a source of error in studies measuring dietary intake . A coefficient of variation of the differences between estimates and weights of food portions has been consistently shown to be around 50% for foods and 20% for nutrients . The errors associated with quantifying the portion of food consumed represent probably the largest measurement error in most dietary assessment methods [2,3].
Estimating portion size
Options for estimating portion size include:
Portion sizes and leftovers are measured by a participant or research staff. Weighing increases the burden of data collection and is still subject to error.
Household measures (e.g. spoon sizes) may be associated with errors and variations considerably. The discrepancies in individual ability to estimate food portions with the use of household measures have been shown to be independent of age, body weight, social status and sex but shown to vary with the food type and true portion size .
Food photographs (2 dimensional)
Food photographs [11,12] are the most commonly used tool to assist the estimation of portion sizes. Showing photos in a computer display is likely to be helpful in an interview-assisted or a computer-based self-administered dietary assessment. However, studies have shown that both adults and children would find it challenging to estimate portion sizes using photographs, and improved portion size assessment aids are required for all age groups . Some foods have been reported to cause greater difficulty than others; for instance, most of the food categories were underestimated (ranging from −2.3% for cassava to −6.8% for rice), except for beverages (+1.6%) and leafy vegetables (+8.7%) which were somewhat overestimated .
Photographs that depict a portion (amount consumed), or serving size (amount served), are typically presented as a series of graduated photographs (see Figure D.1.11) for each food item, bound together in an atlas .
Figure D.1.11 Example of graduated food photographs used to assess portion size with children aged 18 months to 16 years.
Developing a food atlas is a time-consuming process , so where possible an existing tool should be used, providing it has been validated in a similar population. Some factors which should be carefully considered are :
A useful description is available  of how food photograph portions were developed using data from previous National Diet and Nutrition Surveys (NDNS) in the UK in children and young people. In this study, food photographs were compared to a computer tool, the interactive portion size assessment (IPSAS), and found to be comparable with good accuracy but poor precision; making it suitable for group estimates not individual estimates.
Food replicas (3 dimensional) and food models (3 dimensional)
Canada is one of the first countries that used the collection of three-dimensional food models in its national nutrition survey . Other countries have also included such an aiding tool, e.g. the United States and New Zealand. The United States National Health and Nutrition Examination Survey (NHANES) uses 3-dimensional measurement aids in the dietary component of the survey. A few studies that compared the use of 2- and 3-dimensional food models suggested that 2-dimensional ones are as effective as 3-dimensional ones [20,21]. Other studies evaluating food portion aids have shown mixed results and many studies have been carried out in highly controlled and non-representative conditions . Food models are likely to be more reliable than household measures, but often only one size is available, introducing bias as individuals report specific portions that agree with the models available.
Average portion sizes for a given population
Average portion sizes for some foods can be estimated by using existing data [22,23]. These data should ideally be kept up to date and be specific to a study population. Standard portion sizes are used in ascribing portions in FFQs. Respondents completing semi-quantitative FFQs may often have difficulty in relating their consumption to pre-defined reference portion sizes .
Factors that influence the estimation of portion size
The following factors may influence estimates of portion size :
What is misreportming?
Misreporting is a form of respondent bias documented in studies using subjective methods of dietary assessment, and includes both under- and over-reporting.
The detection of misreporting is an investigation of the validity of the dietary data obtained. Often energy intake is used as a proxy for dietary intake in investigations because if energy intake is underestimated, intakes of the other nutrients may also be underestimated . Assessment of the validity of dietary assessment methods therefore often concentrates on estimates of energy expenditure. The development of doubly labelled water (DLW) as a gold standard measure of energy expenditure was thus fundamental to advancing this area of dietary assessment and its links with health outcomes such as obesity .
Misreporting in dietary assessment was extensively studied during the late 1980s/early 1990s. At that time, the focus of obesity research was on identifying a metabolic cause of obesity, as self-reported dietary intakes did not show intakes that were higher than in lean counterparts .
In recent decades, diet quality, in addition to quantity, has received much attention. To study diet quality independent of diet quantity in population-health research, overall dietary habits and dietary intakes have been assessed after adjustment for total energy intake. Therefore, although the literature on misreporting largely focuses on the validity of total energy intake, it is crucial to consider the validity of estimating other dietary intakes after adjustment for total energy intake.
What is the profile of misreporters?
Characteristics of misreporters are population-specific and have not been consistent in the published literature [28,29]. Individuals who misreport on one occasion are likely to do so on subsequent occasions, therefore repeated or more prolonged periods of assessment may not reduce the error due to misreporting . Characteristics that influence degrees of misreporting have been studied and include some features as summarised below [31-39]:
Implications of misreporting
Misreporting can potentially lead to the following:
Accounting for misreporting
Doing nothing may be considered an option, with caveats
If the research aim is to rank individuals and the ability of ranking individuals is well supported objectively (e.g. in a validation study), systematic misreporting can be ignored in data analysis. For example, if 100% of study participants underestimate fruits and vegetable consumption by 10%, the rank of the participants by the consumption is stable. Similarly. the association of dietary intakes with a health outcome can be unchanged even in presence of over- or under-reporters (refer to systematic errors page). Therefore, the need and approach to accounting for misreporting should be considered with a research aim.
It is practically difficult to ensure that systematic errors are not differential. Although doing nothing is an option, analysis after accounting for possible over- and under-reporting of total energy intake is generally recommended in dietary research.
Estimation of energy intake with reference to estimated total energy expenditure
As noted above, a focus on misreporting tends to be on the estimation of absolute levels of total energy intake. Habitual total energy intake should match with habitual total energy expenditure under the assumption that participant weights are static (in a state of energy balance). Under this assumption, estimates of total energy expenditure are often used to identify mis-reporters in dietary research.
Ideally, studies would measure total energy expenditure (TEE) by doubly labelled water (DLW), but this is expensive (approx. £1000 per dose). Therefore, a typical population-health study estimates TEE by basic characteristics that are readily measurable. The estimation uses standard equations of basal metabolic rate (BMR) , plus energy expenditure due to physical activity level (PAL). Estimates of TEE are improved if physical activity is assessed objectively. BMR can be measured directly with calorimetry.
TEE comprises energy expenditure due to physical activity and also digestion of meals. The 'thermic effect of food’ can be estimated from dietary consumption, but is typically ignored because TEE, used as a reference to subjective total energy intake, should be independent of dietary measures assessed simultaneously. Instead, PAL is assigned to be a constant coefficient that captures energy expenditure in addition to BMR. The following summary of PAL values has been calculated from DLW-derived TEE. For each activity, the overall activity is assumed to be as detailed in table D.1.11.
Table D.1.11 Physical activity levels derived from doubly labelled water measurement.
|Activity||Physical activity level (PAL)|
|Chair-bound or bed-bound||
|Seated work with no option of moving around and little or no strenuous leisure activity||
|Seated work with discretion and requirement to move around but with little or no strenuous leisure activity||
|Significant amounts of sport or strenuous leisure activity (30-60 min, 4-5 times per week)||
|Strenuous work or very active leisure||
In the literature, the use of TEE estimates is inconsistent and a topic of methodological research [42-46]. The following are examples of methods to distinguish individuals who are under-reporters, plausible reporters and over-reporters or to control for the magnitude of over- and under-reporting.
I. The ratio of total energy intake to TEE
Different approaches of using the ratio can be undertaken. A study may exclude individuals based on certain cut-points defined within that study, for example, bottom and/or top percentiles and means ±1 standard deviation (Note the standard deviation represents within-individual variation or uncertainty of the ratio). Some more recent studies have suggested statistical adjustment for the ratio to hold degrees of misreporting consistent in a population and stratification of results based on misreporting status to avoid unnecessary loss of power and introduction of unpredictable selection bias [43,45,46].
Using the ratio of total energy intake to TEE, when examining relationships between diet and health in a nationwide cohort in the United States, the use of ±1 SD cut-offs was reported to be preferable to the ±2 SD cut-offs for excluding inaccurate reports . Mendez et al.  reported that the use of such alternative methods had a stronger influence on associations compared to using the Goldberg method for instance in the Spanish EPIC cohort.
II. Goldberg cut-offs
Goldberg cut-offs have been used in dietary surveys or studies that aim to assess absolute levels of total energy intake and other dietary components. Goldberg and colleagues  developed two cut-offs for the agreement between PAL and the ratio of reported energy intake to BMR. The application was first demonstrated by Black et al. . In the first, ‘CUT-OFF 1’, PAL was set at 1.35, the minimum plausible value for most individuals who are weight stable. Subsequent work in this area led to a recommendation that this cut-off should no longer be used because it fails to account for biological variability and measurement error in estimating both energy intake and expenditure . It was also noted that it underestimated underreporting in individuals whose activity was above a sedentary level.
‘CUT-OFF 2’ differs from the previous cut-off as total energy expenditure or PAL varies according to the population or the individuals under study. As with CUT-OFF 1 it involves a statistical comparison between reported energy intake and BMR accounting for biological variability and measurement error. Originally, only a lower cut-off was developed, but if the activity is known or can be assumed, an upper limit can be determined , i.e. the 95% confidence limits for the ratio of reported energy intake to BMR and PAL.
Researchers should state the criteria used for calculating the cut-offs, and include such information in any publications, as there is potential for inappropriate use of the cut-offs, through a lack of thorough understanding of the principles [25,42].
III. Adjustment for predictors of energy balance such as BMI
More recent studies have suggested potential bias when the ratio of estimated energy intake to TEE was used in analyses and when a study outcome is correlated with TEE such as anthropometric outcomes (e.g. fat mass, fat-free mass). An alternative proposed is the adjustment for predictors of energy balance. This approach has been shown to result in associations with energy intake closer to those derived from an objective measure of energy intake .
For misreporting of dietary intakes, rather than total energy intake, bias due to misreporting could be reduced in a population through adjustment for total energy intake . This relates to the topic of errors in multiple variables (e.g. food consumption and energy intake).
IV. Use of predefined lower and upper limits of energy intake
To estimate TEE without an objective measurement (e.g. DLW), equations involve the use of information on weight and height. If the weight is not objectively measured, weight may be over- or under-estimated and TEE estimation can be biased and not useful to identify misreporters. To avoid this issue, a recent study proposed predefined population-specific limits of energy intake to identify misreporters, such as energy intake <500 and >3,500 kcal/day for women .