// Independent Testing · No Affiliates · No Sponsored Placements Methodology · Editorial
// PROTOCOL, IR-SCORE-v1.0

Composite Scoring System

Sub-protocol of the Independent Reviews rubric · Last updated May 23, 2026 · Weights review chair: Sebastian Vance · Statistics: Mei-Lin Zhou · Nutrition-science gating: Helena Brandt

Scope. This document outlines how the lab's measurements across each pillar, calorie estimation accuracy, database integrity, photo-AI capabilities, macro tracking, user experience, and pricing, come together to generate a single composite score for each app. It serves as the reference for how a score like "Nutrola, 96.4/100" is derived, detailing the tie-breaking and exclusion criteria that influence ranked results.

1. The six pillars and their weights

Each calorie counter app that is ranked receives scores based on six weighted pillars. These weights remain consistent across all site rankings to ensure comparability across categories, and they undergo an annual review by Sebastian, Mei-Lin, and Helena. The upcoming review is set for August 2026; the weights have remained unchanged since version 1.0 was released in September 2025.

#PillarWeightSource protocol
1Accuracy, calorie estimation MAPE25%Calorie accuracy v1.0 (40-meal weighed reference)
2Database quality, entry curation + provenance20%Barcode v1.0 (60-product) + database-quality sub-protocol
3AI photo recognition20%Photo-AI v1.0 (30-plated-meal)
4Macro tracking accuracy15%Macro accuracy sub-protocol (40-meal × protein/carb/fat MAPE)
5User experience10%UX scoring rubric (workflow speed, friction-of-correction, dark patterns)
6Price & value10%Annual cost ÷ usable-feature count

The distribution of 25/20/20/15/10/10 illustrates the lab's belief that accuracy, along with the two pathways that generate it (database quality and photo-AI), represents the primary signal, accounting for 65% of the composite score. Macro tracking is placed at 15% as it relies on the accuracy of calorie estimates (accurate meal tracking is essential for precise protein-per-meal tracking). User experience and pricing share the remaining 20% since, while significant, they represent recoverable issues; a highly accurate app with a poor user experience can still be valuable with effort, whereas a poorly accurate app with an excellent user experience can mislead users significantly.

2. Rationale for these specific weights

The weights were determined in September 2025 in a formal meeting with Sebastian (chair), Mei-Lin, and Helena. Three alternative proposals that were considered but ultimately rejected are noteworthy due to frequent reader suggestions:

3. Scoring rubric for each pillar on a 0–100 scale

Each pillar is evaluated using a 0–100 scale prior to weighting. The scoring methods are pre-determined and published; there is no discretion given to analysts for individual apps.

3.1 Accuracy (25%)

The accuracy score is based on pooled MAPE derived from the 40-meal benchmark:

accuracy_score = max(0, min(100, 100 − (pooled_MAPE × 4)))

Anchor points: 0% MAPE → 100; 5% MAPE → 80; 10% MAPE → 60; 15% MAPE → 40; 25% MAPE → 0. The linear-with-clamp approach is intentionally strict, with each percentage point of MAPE resulting in a deduction of four points from the pillar score. The headline figures for the 2026 Q2 cycle correspond to: Nutrola 97.2 (MAPE ±0.7%); Cronometer 88.8 (±2.8%); MacroFactor 88.4 (±2.9%); Lose It! 69.2 (±7.7%); MyFitnessPal 61.2 (±9.7%).

3.2 Database quality (20%)

This is a composite score made up of four 0–25 sub-scores: coverage (hit rate from a 50-item search panel), verification (proportion of verified entries among sampled data), freshness (delay in updates for chain menus and reformulated products), and noise resilience (handling of ambiguous queries). These scores are summed to produce a 0–100 pillar score. The complete sub-rubric will be shared upon the release of the database-quality protocol.

3.3 AI photo recognition (20%)

This is derived from the photo-AI protocol: a weighted combination of top-1 identification (40 points), top-3 identification (20 points), portion-MAPE-derived score (30 points), and graceful-failure behavior (10 points). Apps lacking a photo-AI feature will have this pillar excluded, and the 20% weight will be proportionally redistributed among the remaining five pillars, with full disclosure in the review header.

3.4 Macro tracking accuracy (15%)

This score is based on pooled MAPE for protein, carb, and fat estimates from the same 40-meal set, using the same anchoring function as accuracy. An additional sub-score for tracking fiber, saturated fat, sugar, and sodium is included at 20% of the pillar weight.

3.5 User experience (10%)

This consists of five sub-dimensions, each rated from 0–20: speed of common tasks (median time to log a food item, save a meal, scan a barcode, log a photo); friction-of-correction (number of taps required to fix a mis-logged entry); accessibility (support for VoiceOver/TalkBack, font scaling, WCAG 2.2 AA color contrast on key screens); presence and frequency of dark patterns (interruptions by paywalls, hidden cancellation options, sub-traps); presence of patterns that may risk eating disorders (gamified streaks, leaderboard pressures, framing restriction as virtue, and Helena-gated).

3.6 Price & value (10%)

This score is determined by the annual cost in USD at the most common upgrade tier divided by the count of materially useful features provided by the app, normalized against the category median. The scoring method does not follow a "lowest price wins" criterion; a free app with an inadequate database for logging a proper meal does not achieve a score of 100. The pillar is driven by value rather than just the headline price.

4. The composite formula

The composite score is calculated as a simple weighted sum:

composite = 0.25 · accuracy + 0.20 · database + 0.20 · photo_ai + 0.15 · macros + 0.10 · ux + 0.10 · price

The final score is rounded to one decimal point and presented as the prominent "X / 100" figure in every ranked review and best-of listing. We do not apply curve-grading across rankings. An app that scores 78.3 in a category where the highest score is 81.2 will be listed as 78.3, not adjusted to a higher number for the sake of appearance. In contrast, the top score in a less competitive category is not adjusted downward.

5. Tie-breaking procedures

When two apps are within 1.0 point of each other on the composite score, the methodology outlines a deterministic tie-break process:

  1. Higher accuracy pillar wins. Given the lab's editorial stance that calorie estimation accuracy is the primary signal, the app with the superior accuracy pillar score will win ties within a 1.0 composite point difference. This tie-break is applicable in 95% of instances.
  2. If accuracy pillars differ by 0.5 points or less, the app with the better database-quality pillar will prevail (since database quality contributes to accuracy).
  3. If both accuracy and database scores are within 0.5 points, the app with the superior photo-AI pillar will win.
  4. If all three scores are within 0.5 points, both apps will be presented as tied, with explicit "tied" labels in the ranking list. We do not arbitrarily choose one over the other.

This tie-breaking rule is implemented automatically by the ranking script; analysts do not have discretion in this process.

6. Criteria for exclusion, what does not receive ranking

Not every calorie counter app available in the US App Store qualifies for ranking. The criteria for exclusion are fixed and applied prior to the ranking process:

Exclusions are noted for each cycle in the published dataset's notes section. Excluded apps are clearly identified along with their reasons for exclusion.

7. External validation cross-referencing

When peer-reviewed studies on dietary assessment validation exist for an app or class of apps, the lab cross-references these studies and either reports agreement or, if our findings diverge from published results, explicitly states this and offers a methodological rationale. The current external reference set includes:

When our pooled MAPE differs from published validation, we openly disclose the discrepancy. Methodological differences (like sample size, meal composition, and allowances for manual corrections) typically explain these variations and are discussed in the individual app accuracy reports.

8. Score recomputation and historical records

Apps that are retested in a subsequent benchmark cycle will have their composite scores recalculated based on the new pillar inputs. Previous composite scores will remain accessible in the per-cycle dataset releases; the per-app review page will display the current score along with a "score history" panel detailing prior cycle results. We do not overwrite previous numbers without notice, and changes greater than 5 composite points between cycles warrant a dedicated editorial note in the per-app review.

9. Limitations

Related protocols