Science Written by AI

How to read a nutrition study

A field guide to reading diet research without being misled — study designs, confounding, FFQ noise, absolute versus relative risk, meta-analysis pitfalls, and conflict-of-interest checks.

#evidence#epidemiology#methods#rct#meta-analysis#grade#conflicts-of-interest

Nutrition headlines are almost always wrong — not because nutrition scientists are careless, but because the distance between a published finding and a defensible life decision is longer than a press release can carry. This page is a practical reading guide. It will not make you an epidemiologist; it will make you harder to fool.

Start with the design

The study design answers a different question than the one the headline implies. Match them before anything else.

Randomised controlled trials (RCTs) allocate participants to interventions by chance, which breaks the link between exposure and confounders. They answer “does X cause Y” under controlled conditions, usually for short windows and surrogate outcomes (LDL, blood pressure, HbA1c). They rarely run long enough to settle “will this diet prevent my heart attack in thirty years.” CONSORT 2010 (Schulz et al., 2010) is the reporting standard; if a trial does not report allocation concealment, blinding, and loss to follow-up, treat it with care.

Prospective cohort studies follow people over years or decades and measure associations. They are the workhorse of nutritional epidemiology and the source of most “eating X is linked to Y” claims. They cannot establish causation alone; they can only describe associations that survive adjustment.

Case-control and cross-sectional studies are cheaper and weaker. They are susceptible to recall bias (case-control) and cannot establish temporal order (cross-sectional). Most UK Biobank diet analyses are effectively cross-sectional on exposure.

Mendelian randomisation uses genetic variants as instruments for lifetime exposure, approximating a natural randomisation. It is powerful for single nutrients with clean genetic proxies and weaker when exposures are multi-nutrient dietary patterns.

Ecological studies compare populations (countries, regions) rather than individuals. They are hypothesis-generating, not confirmatory. Treating a country-level correlation as individual-level causation is the ecological fallacy.

Confounding and the healthy-user effect

A confounder is a third variable that causes both the exposure and the outcome. In Western cohorts, vegetarians and vegans systematically differ from the comparator population on exercise, smoking, alcohol, body mass, education, and health-seeking behaviour. This bundle is the healthy-user effect (Satija et al., 2015). Any observed mortality advantage for plant-based eaters is partly real diet effect and partly the rest of the bundle.

Good cohort papers adjust for measured confounders; directed acyclic graphs (Shrier & Platt, 2008) help identify which covariates to include and which to leave out. What adjustment cannot fix is residual confounding — unmeasured or poorly measured variables that still carry signal. Willett’s textbook (Willett, 2013) is the standard treatment of how to reason about this in practice.

When a paper reports effect estimates before and after adjustment, watch how the effect moves. A hazard ratio that shrinks from 0.70 to 0.92 after adjusting for smoking and exercise is telling you where most of the association actually lives.

Food-frequency questionnaires are noisy instruments

Most long-running cohorts rely on food-frequency questionnaires (FFQs) — lists of foods respondents estimate consuming over the past year. FFQs are cheap, scalable, and measurably wrong. Correlation coefficients between FFQs and reference measures such as weighed records or recovery biomarkers typically sit in the 0.3–0.6 range, depending on nutrient (Willett, 2013). Sodium, total energy, and alcohol are especially poorly captured.

The consequence is attenuation: random measurement error pulls effect estimates toward the null, so true effects look smaller. Worse, differential error — where exposure misclassification is associated with outcome status — can bias in either direction. When a paper’s exposure is “red meat intake from a single FFQ administered in 1992,” the decimal places on the hazard ratio are not doing the work they appear to be doing.

Absolute versus relative risk

A 30 percent relative reduction in a rare outcome can be a 0.3 percentage-point absolute reduction. Both numbers are true; only the second is useful for deciding whether an intervention matters in your life. NIH Office of Dietary Supplements methodology guides consistently recommend reporting both. If a paper or press release reports only relative risk, calculate the absolute difference from the event counts yourself; if the paper does not supply event counts, treat that as a yellow flag.

The companion concept is number needed to treat (or harm): how many people must change behaviour for one additional event to be prevented or caused. NNTs in the hundreds for primary prevention are common and useful — they are not failures of the intervention, they are the scale of the real effect.

Effect size versus statistical significance

A p-value below 0.05 tells you the data would be unusual if the null were true at the chosen threshold. It does not tell you the effect is large, important, or replicable. Greenland et al. (2016) is the standard corrective on common misinterpretations. Read the confidence interval first: a hazard ratio of 0.88 (95 percent CI 0.86 to 0.90) in a million-person cohort is statistically significant and clinically modest. In a small trial the same point estimate with a CI of 0.55 to 1.40 is hypothesis-generating at best.

Large samples make tiny effects significant. Significance is a statement about signal-to-noise; effect size is a statement about magnitude. Both matter; they are not the same.

Meta-analysis is only as good as its inputs

Meta-analysis pools effect estimates across studies. PRISMA 2020 (Page et al., 2021) is the reporting standard. The pitfalls are familiar:

Garbage-in, garbage-out. Pooling fifteen underpowered studies with incompatible exposures still yields a confident-looking number.
Heterogeneity. I-squared statistics quantify between-study variability; values above 50 percent warrant caution about whether the studies are estimating the same underlying effect.
Publication bias. Null results are published less often. Funnel plots and Egger’s test detect asymmetry but cannot fix it.
Researcher degrees of freedom. Inclusion criteria, exposure harmonisation, and outcome definitions are choices. Sensitivity analyses across reasonable choices are the honest corrective.

Ioannidis’s 2005 argument in PLoS Medicine — that most published findings in small, flexible, heavily-contested fields are false — applies with particular force to single-nutrient, single-outcome meta-analyses built on FFQ exposures. His 2018 JAMA commentary applied the same analysis specifically to nutritional epidemiology and called for structural reform rather than marginal fixes.

GRADE: rating the evidence as a whole

The GRADE framework (Guyatt et al., 2008; Schünemann et al., 2013) is the dominant system for rating the overall quality of evidence behind a recommendation. It starts RCTs at “high” and observational studies at “low,” then adjusts up or down for risk of bias, inconsistency, indirectness, imprecision, publication bias, large effect sizes, dose-response gradients, and plausible residual confounding in the opposite direction. When a guideline cites “moderate-quality evidence” or “strong recommendation, low-quality evidence,” those are GRADE terms with specific meaning. Learning to read a GRADE evidence profile is one of the highest-leverage skills for anyone consuming dietary guidance.

Conflicts of interest

Funding source predicts conclusions in nutrition research. Lesser et al. (2007) found that industry-funded studies of beverages were roughly four to eight times more likely to reach conclusions favourable to the sponsor than independently funded studies of the same questions. This does not mean industry-funded science is worthless; it means funding is a prior that shifts the burden of methodological scrutiny. Check:

The declared funding line and the conflict-of-interest statement.
Authors’ industry affiliations (advisory boards, speaker fees, patents).
Whether the data and analysis code are available for independent re-analysis.
Whether the pre-registered protocol matches the published analysis — outcome switching is a reliable red flag.

A short checklist

What design — and what question does that design actually answer?
What is the exposure, the comparator, and how were both measured?
What confounders were adjusted for, and what plausibly was not?
Effect size with a confidence interval — not just a p-value.
Absolute risk and NNT, not only relative risk.
For reviews, PRISMA flow and GRADE rating.
Funding, conflicts, pre-registration, data availability.
Has it replicated?

Reading this way will not make every nutrition claim tractable — some genuinely are not — but it will keep you from mistaking a press release for a lived conclusion. That is the whole job.

Sources

Neighborhood

See full graph →