This blog is closed to new posts due to inactivity. The post remains here as part of the network’s archive of useful research information. We hope you'll join the conversation by posting to an open topic or starting a new one.
 
Since the INTERGROWTH-21st Project released its main findings in the Lancet last month, there has a groundswell in clinical opinion about the impact of the new global standards in routine practice. Last week, authors from the INTERGROWTH-21st Project respond to their questions and comments.
 
Colleagues from the U.S. National Institutes of Health wrote:
 
We read with interest the Article by Villar and colleagues, which suggests no differences in fetal growth exist as shown by crown-rump length or head circumference from eight geographically diverse study sites. A key determinant of their decision to pool across sites was whether the standardised site difference (SSD; defined as the difference between the site and overall mean standardised by the overall SD) at different gestational ages was less than 0·5. We believe that this criterion could be too liberal, resulting in potentially inappropriate pooling of sites. To show this potential, we calculated the probability of a newborn baby's measurements being below the lower limit of the standard for a particular site when the standard was constructed using data pooled across different sites for different values of SSD from −0·5 to 0·5 as recommended. Probabilities were computed as a function of SSD when constructing both the third and fifth centiles. When the SSD is zero, the site-specific and pooled centiles are the same. However, when the SSD is −0·50, the probability of being less than the 5th centile is 12·6%, with a probability of 1·6% for an SSD of 0·50. This discrepancy could have important clinical implications. If a pooled standard is used when the SSD is 0·50, 3·4% of fetuses (targeted centile—pooled centile = 5·0%—1·6%) would be misclassified as not extreme. Likewise, when the SSD is 0·50, 7·6% (targeted centile—pooled centile = 12·6%—5·0%) of fetuses would be misclassified as extreme. Thus, even with a small SSD, the estimated centiles could be seriously biased when pooling sites. Our calculation, along with figures 2 and 3 in Villar and colleagues' paper, suggests that we have to be very careful when interpreting the pooled standard in this situation. Further, Villar and colleagues'. proposed sensitivity analysis that computes the standard leaving out only a single site lacks the ability to detect meaningful differences between these potentially different sites.
 
A colleague with the Perinatal Institute in Birmingham added:

Villar and colleagues' multinational longitudinal fetal growth study only reports on ultrasound measurements of crown rump length and head circumference. In our opinion, the clinically more important variable is fetal abdominal circumference, which is the main variable determining estimated fetal weight. In fact, the term “small for gestational age” (SGA) used in their report conventionally refers to fetal weight and birthweight, not to skeletal size. Abdominal circumference and estimated fetal weight are the main predictors of an SGA infant. A small size for gestational age is strongly linked to stillbirth and neonatal death, perinatal morbidity, cerebral palsy, and delayed metabolic and cardiovascular effects, and its antenatal detection is a key challenge in maternity care.
 
It is, therefore, curious that this report does not include an analysis of abdominal circumference growth, even though, according to the study protocol, it was measured. Could it be that this sample of “educated, affluent and healthy women” actually does show ethnic or geographic differences in this variable, contradicting the authors' premise of an international ”likeness” of fetal growth and newborn size? Such a difference is certainly suggested by their own data, in which the mean optimum term birthweights of babies born in India (2·9 kg) and the UK (3·5 kg) are shown to differ by as much as 600 g, or 21%. An average 600 g difference in birthweight, with an SD of 400 g, would mean that an Indian mother's baby, if designated SGA by a UK birthweight standard, would have a 90% chance of not being SGA if the standard was adjusted for a healthy Indian population. Babies thus reclassified by an ethnicity-specific standard have the same perinatal mortality risk as those that are normal size by either standard. In the UK's multiethnic population, south Asian mothers do not have smaller babies because of nutritional or socioeconomic deprivation, and multivariable regression analyses have shown that substantial ethnic differences remain, even after maternal size is adjusted for and pathological factors, including low BMI, are excluded. Racial differences in birthweight have also been shown in large databases of “extremely low risk” populations in the USA. Such evidence emphasises the need to adjust for physiological variation in fetal size and growth, rather than assuming that one size fits all. As it stands, this presumably high cost study is at risk of adding confusion to continuing efforts to understand normal and abnormal fetal biometry, and of missing an opportunity to define clinically relevant physiological differences in fetal growth in different populations.
I am employed by the Perinatal Institute, a not-for-profit organization that derives income from various work streams supporting maternity services, including training programmes, clinical audit and growth chart software development, licensing, and maintenance.

INTERGROWTH-21st authors responded:

We appreciate the interest in the analytical strategy of the INTERGROWTH-21st Project from Paul Albert and colleagues because the results of this study are relevant to the health care of increasingly diverse, mixed national origin and ethnic populations worldwide.
 
We did not state that there are no differences in fetal growth across our eight free-living study populations because differences were obvious and inevitable. Instead, we stated that the growth data from the eight sites were similar enough to be pooled for construction of international standards designed for use as a screening method worldwide. We based this assertion on three separate, prespecified criteria and cut-off points within a comprehensive project using the prescriptive approach to growth monitoring.
 
We acknowledged in the study methods that these pooling criteria might not be sufficiently robust when used individually, but would be informative in combination. In fact, our three criteria are identical to those used in the WHO Multicentre Growth Reference Study (MGRS) to construct the WHO Child Growth Standards, which the Centers for Disease Control and Prevention introduced in the USA for infants aged 0—2 years in 2010. We do not agree that the cut-off point for the standardised site difference (SSD) criterion is too liberal. The SSD denominator is the all sites combined SD, obtained from the mean of three highly standardised measures of the same infant at each visit. This method leads to small SDs, a result that is unlikely to be achieved in less controlled settings or using less stringent protocols. For example, in our study, the pooled SDs of birth length at term lie between 1·7 cm and 1·8 cm; whereas the range is 2·4—2·8 cm in large, multihospital-based data. Small SDs, in turn, increase SSDs; therefore, our SSDs represent rather conservative estimates.
 
Furthermore, in the fetal growth longitudinal study-like population included in the newborn cross-sectional study we did 48 comparisons of SSDs for birth length in the six gestational age windows. 90% of the measurements were even less than the more stringent (0·32 SD) cut-off we proposed as a more conservative limit in the original study design. Overall, for the 128 comparisons of fetal crown-rump length and head circumference from early pregnancy to term, and birth length, only one was marginally higher (SSD—0·58) than the primary cut-off of ±0·5 SD that was used in the WHO MGRS and prespecified in the INTERGROWTH-21st protocol for use in our studies. Our sensitivity analysis showed the robustness of the international standards for use in different populations, as is the case with the WHO standards that have been adopted by more than 125 countries worldwide.
In the second letter, Jason Gardosi implied that we modified our main fetal outcome measures after realising that the original analyses contradicted our hypothesis. Such an interpretation is incorrect. In our protocol and biological and statistical analytical strategy, we stated that we would follow the WHO MGRS protocol,  for deciding whether the data from our eight geographical regions could be pooled to construct international standards. This pooling strategy needed us to use only fat-independent measures (eg, fetal head circumference or birth length) for comparison of growth across populations. Based on these principles, we published a detailed description of criteria for a priori selection of fetal and newborn, fat-free skeletal measures. We specifically avoided exploring fat-related fetal (abdominal circumference) or newborn (birthweight) measures in the combinability analysis so as to reduce bias in interpretation of results.
 
After we were satisfied that the fat-free skeletal measures from the different sites were sufficiently similar, we combined them to construct a single international standard for each measure, including fetal abdominal circumference, from early pregnancy to birth, and, for preterm babies, to 1 year of postnatal age. Although there are differences between measures of Indian babies and those from the UK, this comparison is not relevant to the question of the use of international standards in different settings. We never suggested or recommended that UK data should be applied to Indian populations. Just as data from 0—5 year olds in individual countries have been compared with the WHO Child Growth Standards to assess levels of worldwide undernutrition, so the appropriate question should have been not how Indian fetuses and newborn babies compare with those from the UK or another country's population, but rather how they compare with the international standards constructed by the combination of data from all the sites in the INTERGROWTH-21st Project.
Lastly, the implication that race and ethnic origin—socially constructed, self-reported terms used synonymously by Gardosi—are biological entities associated with the hundreds of loci implicated in the genetics of human stature (unlike the small number of genes controlling skin pigmentation) is erroneous. 116 different terms and definitions have been offered to describe racial and ethnic groups, which is why the use of such imprecise terms is discouraged in biomedical publications. 2500 years ago, Confucius said “Men's natures are alike, it is their habits that carry them far apart,” so our findings are nothing new.