LABDA logo

SCOPE-MOVE evidence explorer for method selection

SCOPE-MOVE

SCOPing Explorer of accelerometer-based prediction models for 24-hour MOVEment behaviour analysis

A validation-level interface for locating and comparing candidate processing pipelines and prediction models for 24-hour movement behaviour analysis across wrist accelerometry studies, supporting evidence-guided selection in surveillance, epidemiology, and digital health research.

Author: Millen J. Theophilus GitHub repository
Loading data

Welcome

Find the right evidence route for your method-selection question

Purpose

What SCOPE-MOVE helps you do

SCOPE-MOVE addresses a practical barrier in 24-hour movement behaviour research: locating and comparing candidate accelerometer-based prediction models across heterogeneous criterion-validation studies. It turns the extracted validation-level evidence base into a reusable digital resource for comparing methods, checking study context, and supporting evidence-guided selection.

Your use-case and navigational route

Choose the path closest to your task

Evidence density matrix

Click a cell to open the matching performance view

Primary tool

Interactive validation explorer

Forest view

Selected validation rows

Exploratory model ranking

Externally validated model shortlist

Ranking framework

Transparent, evidence-guiding score
Eligibility

Only models named in at least one external-validation source field are ranked. Rows marked as newly developed or not previously published are not used to create a model identity.

Harmonisation

Model names are harmonised from the prior-publication field by extracting author-year model tokens and normalising spelling variants such as van Hees / vanHees and algorithm wording. Hildebrand 2014 and Hildebrand 2017 cutpoint references are combined because many external validations cite those cutpoints together.

Evidence rows

Each eligible model is linked to its external validations and cross-validations when the source field or original study author-year matches the harmonised model identity.

Performance

Performance uses F1 when available. If F1 is not reported, the score falls back to the mean of sensitivity and specificity when available.

Composite score

The ranking score is the equal-weight mean of external-validation breadth, average performance, and device breadth. It is an exploratory shortlist, not a claim of universal superiority.

Model evidence details

Top ranked models by domain family

Ranked models

Interpretation guide

Definitions, interpretation, and source-link guidance

Definitions and interpretation guide

Progressive disclosure
Definitions and field guide
Prediction model

An accelerometer-based algorithm, threshold, rule, statistical model, machine-learning model, or deep-learning model used to estimate or classify a 24-hour movement behaviour outcome.

24-hour movement behaviours

The app uses this umbrella term for activity intensity, activity type, sleep-wake behaviour, and energy expenditure outcomes derived or estimated from wearable accelerometry.

F1

Harmonic mean of precision and sensitivity. Higher is better.

Sensitivity

True positive rate or recall. Higher is better.

Specificity

True negative rate. Higher is better.

MAPE

Mean absolute percentage error for energy expenditure. Lower is better.

Index measure

The wrist-worn accelerometer model, algorithm, or processing method being evaluated.

Criterion measure

The reference method used to judge validity, such as calorimetry, doubly labelled water, video or direct observation, or polysomnography, depending on the outcome.

Activity intensity

Time spent in physical activity intensity categories such as sedentary, light, moderate, or vigorous, commonly benchmarked against indirect calorimetry and MET-based thresholds.

Activity type

Discrete activity classes such as walking, running, cycling, or sedentary behaviour, commonly benchmarked against direct or video observation.

Sleep-wake

Binary sleep versus wake classification, commonly benchmarked against polysomnography.

Energy expenditure

Estimated energy cost or expenditure outcome, commonly validated against calorimetry or doubly labelled water depending on the target construct.

Traditional model group

Threshold, heuristic rule-based, and general linear model approaches. Typically simpler and more interpretable.

Non-traditional model group

Classical machine learning, deep learning, temporal, ensemble, and other more complex approaches.

Split-sample validation

A dataset is divided into model-development and validation or testing subsets.

Hold-out validation

A two-set split in which one set is used for training and the other for validation or testing.

Cross-validation

Repeated training and testing splits where performance is averaged across folds or repetitions.

Apparent validation

The model is trained and validated on the same training data from the same cohort.

Structured protocol

Prescribed activities or timings, usually in controlled settings.

Unstructured protocol

Naturalistic or free-living routines with participant autonomy.

Hybrid protocol

A mixed design that combines prescribed elements and participant-led routines.

Laboratory environment

Validation data collected in controlled laboratory or clinic-like conditions.

Free-living environment

Validation data collected in home, community, school, or other naturalistic settings.

External validation

Testing on a different dataset collected separately from training data. These rows are marked with a diamond and EXT tag.

Study sample size

The recruited or study-level participant count from the study-characteristics extraction.

Validation participants

The number of participants in a validation or test set. In leave-one-subject-out cross-validation this can equal the number of outer folds.

Device brand

The wearable manufacturer or brand recorded in Study_Characteristics.csv. The Studies tab device landscape uses brands only, not model names.

Sampling rate

The acceleration acquisition frequency. It is a device-processing field, not participant sample size.

Metric origin

How a reported performance metric was produced, for example macro-averaging across classes, micro-averaging pooled counts, per-class reporting, aggregate confusion matrix computation, or unclear origin.

SD origin

The extracted source of standard deviation or variation, such as participants, validation folds, behaviour classes, folds plus classes, external test variation, or not reported.

Model source / prior publication

The extraction field describing whether the model was published previously and, when reported, which prior study or algorithm it came from.

Pooled line

The vertical dashed line in the Primary Tool. The manuscript option uses available manuscript covariates; the simple option uses no covariates.

Above pooled

Rows with estimates greater than the current pooled line. They are highlighted to guide follow-up inspection, not to declare a best model.

DOI linkage

Clicking DOI-linked validation rows opens the source paper in a new tab.

QUADAS-2 domains

Quality domains are patient selection/study design, index measure, criterion measure, and flow/timing.

Signalling questions

SQ1-SQ11 are the quality-tool responses extracted in the quality CSV. The Quality tab summarises Yes, No, Unclear, and NA responses and shows them per study.

Quality signalling-question interpretation guide

The quality assessment uses signalling questions to support each QUADAS-2 domain judgement. Use the guide below to understand what each response means before comparing validation estimates.

Yes: supports low concern No: inspect as potential bias Unclear: reporting insufficient NA: not applicable
Data dictionary and raw input files

The data dictionary describes the raw input files used to generate the analyses and app data bundle. App-facing definitions, interpretation notes, and hover-field explanations are presented in the Definitions and field guide panel above.

Open raw-input data dictionary

How to use SCOPE-MOVE
  1. Select a validation performance metric: F1, sensitivity, or specificity.
  2. Filter validations by 24-hr-MB domain, prediction model group, age group, and wearable device brand.
  3. Inspect validation-level estimates and metadata in the forest view and selected-rows table. Rows above the pooled estimate are highlighted and separated by a horizontal dashed line when the current ordering supports a clean split.
  4. Use DOI links to review the original source paper when choosing candidate methods.
  5. Use the Data tab to examine the extraction context behind each view.
How plotted estimates, pooled lines, and cautions are derived
  • Each plotted point is the extracted validation performance metric for a specific validation. The classifier metrics are F1, sensitivity, and specificity.
  • Metrics were extracted as the unweighted mean and standard deviation across behaviour classes when reported.
  • When mean and SD were not reported but class-specific performance was reported, metrics were macro-averaged across classes to obtain an unweighted mean and SD.
  • When a confusion matrix included validation counts for each behaviour class, metrics were computed and macro-averaged across classes to obtain an unweighted mean and SD.
  • Extracted SDs were converted to sampling variances and divided by cross-validation folds when SDs reflected CV, or by number of behaviour classes when SDs reflected class variation. When SDs were unavailable, a lower-bounded behaviour-domain-specific median constant variance was used to stabilise weights.
  • The constructed variance was used to build the displayed 95% confidence interval for each validation estimate.
  • For meta-analysis, classifier metrics are analysed on the logit scale and back-transformed to proportions for display.
  • The manuscript pooled line represents a participant-weighted meta-regression orientation for the selected subset, using the manuscript covariate set where available. The simple pooled line is a random-effects line with no covariates.
  • The manuscript covariate pooled line uses the manuscript covariate set: centred number of behaviour classes, centred validation folds, missingness indicators, test environment, test protocol, and behaviour domain.
  • Covariates that are constant or effectively unavailable in the filtered subset are dropped from that pooled-line calculation and listed under the plot.
  • The simple random-effects line uses the validation table variances and includes no covariates.
  • The dashed pooled line can mask heterogeneity across behaviours, protocols, devices, and populations.
  • Missing metrics reflect reporting and extraction availability, not automatic exclusion by this app.

Scoping review landscape

Study population, countries, and device landscape

Publication timeline

Publication year on the x-axis, stacked by model complexity

Continents and countries

Continents on the y-axis, country segments on each bar

Device landscape

Wearable brands from Study_Characteristics.csv only

Participant information

Sample size, sex reporting, ethnicity reporting, and health status

Prediction model type landscape

Best-performing model sub-types from Study_Characteristics.csv, one entry per study row

Study index

Transparency and risk of bias

Study-level quality and signalling questions

Domain-level risk

From Quality_Assessment.csv

Signalling questions

Yes, No, Unclear, and NA response mix

Per-study inspection

Quality assessment table

Filtered study rows

Data workbench

Inspect source and derived tables

Dataset