Welcome
Find the right evidence route for your method-selection question
Purpose
What SCOPE-MOVE helps you do
SCOPE-MOVE addresses a practical barrier in 24-hour movement behaviour research: locating and comparing candidate accelerometer-based prediction models across heterogeneous criterion-validation studies. It turns the extracted validation-level evidence base into a reusable digital resource for comparing methods, checking study context, and supporting evidence-guided selection.
Your use-case and navigational route
Choose the path closest to your taskEvidence density matrix
Click a cell to open the matching performance viewPrimary tool
Interactive validation explorer
Forest view
Selected validation rows
Exploratory model ranking
Externally validated model shortlist
Ranking framework
Transparent, evidence-guiding scoreOnly models named in at least one external-validation source field are ranked. Rows marked as newly developed or not previously published are not used to create a model identity.
Model names are harmonised from the prior-publication field by extracting author-year model tokens and normalising spelling variants such as van Hees / vanHees and algorithm wording. Hildebrand 2014 and Hildebrand 2017 cutpoint references are combined because many external validations cite those cutpoints together.
Each eligible model is linked to its external validations and cross-validations when the source field or original study author-year matches the harmonised model identity.
Performance uses F1 when available. If F1 is not reported, the score falls back to the mean of sensitivity and specificity when available.
The ranking score is the equal-weight mean of external-validation breadth, average performance, and device breadth. It is an exploratory shortlist, not a claim of universal superiority.
Model evidence details
Top ranked models by domain familyRanked models
Interpretation guide
Definitions, interpretation, and source-link guidance
Definitions and interpretation guide
Progressive disclosureDefinitions and field guide
An accelerometer-based algorithm, threshold, rule, statistical model, machine-learning model, or deep-learning model used to estimate or classify a 24-hour movement behaviour outcome.
The app uses this umbrella term for activity intensity, activity type, sleep-wake behaviour, and energy expenditure outcomes derived or estimated from wearable accelerometry.
Harmonic mean of precision and sensitivity. Higher is better.
True positive rate or recall. Higher is better.
True negative rate. Higher is better.
Mean absolute percentage error for energy expenditure. Lower is better.
The wrist-worn accelerometer model, algorithm, or processing method being evaluated.
The reference method used to judge validity, such as calorimetry, doubly labelled water, video or direct observation, or polysomnography, depending on the outcome.
Time spent in physical activity intensity categories such as sedentary, light, moderate, or vigorous, commonly benchmarked against indirect calorimetry and MET-based thresholds.
Discrete activity classes such as walking, running, cycling, or sedentary behaviour, commonly benchmarked against direct or video observation.
Binary sleep versus wake classification, commonly benchmarked against polysomnography.
Estimated energy cost or expenditure outcome, commonly validated against calorimetry or doubly labelled water depending on the target construct.
Threshold, heuristic rule-based, and general linear model approaches. Typically simpler and more interpretable.
Classical machine learning, deep learning, temporal, ensemble, and other more complex approaches.
A dataset is divided into model-development and validation or testing subsets.
A two-set split in which one set is used for training and the other for validation or testing.
Repeated training and testing splits where performance is averaged across folds or repetitions.
The model is trained and validated on the same training data from the same cohort.
Prescribed activities or timings, usually in controlled settings.
Naturalistic or free-living routines with participant autonomy.
A mixed design that combines prescribed elements and participant-led routines.
Validation data collected in controlled laboratory or clinic-like conditions.
Validation data collected in home, community, school, or other naturalistic settings.
Testing on a different dataset collected separately from training data. These rows are marked with a diamond and EXT tag.
The recruited or study-level participant count from the study-characteristics extraction.
The number of participants in a validation or test set. In leave-one-subject-out cross-validation this can equal the number of outer folds.
The wearable manufacturer or brand recorded in Study_Characteristics.csv. The Studies tab device landscape uses brands only, not model names.
The acceleration acquisition frequency. It is a device-processing field, not participant sample size.
How a reported performance metric was produced, for example macro-averaging across classes, micro-averaging pooled counts, per-class reporting, aggregate confusion matrix computation, or unclear origin.
The extracted source of standard deviation or variation, such as participants, validation folds, behaviour classes, folds plus classes, external test variation, or not reported.
The extraction field describing whether the model was published previously and, when reported, which prior study or algorithm it came from.
The vertical dashed line in the Primary Tool. The manuscript option uses available manuscript covariates; the simple option uses no covariates.
Rows with estimates greater than the current pooled line. They are highlighted to guide follow-up inspection, not to declare a best model.
Clicking DOI-linked validation rows opens the source paper in a new tab.
Quality domains are patient selection/study design, index measure, criterion measure, and flow/timing.
SQ1-SQ11 are the quality-tool responses extracted in the quality CSV. The Quality tab summarises Yes, No, Unclear, and NA responses and shows them per study.
Quality signalling-question interpretation guide
The quality assessment uses signalling questions to support each QUADAS-2 domain judgement. Use the guide below to understand what each response means before comparing validation estimates.
Data dictionary and raw input files
The data dictionary describes the raw input files used to generate the analyses and app data bundle. App-facing definitions, interpretation notes, and hover-field explanations are presented in the Definitions and field guide panel above.
How to use SCOPE-MOVE
- Select a validation performance metric: F1, sensitivity, or specificity.
- Filter validations by 24-hr-MB domain, prediction model group, age group, and wearable device brand.
- Inspect validation-level estimates and metadata in the forest view and selected-rows table. Rows above the pooled estimate are highlighted and separated by a horizontal dashed line when the current ordering supports a clean split.
- Use DOI links to review the original source paper when choosing candidate methods.
- Use the Data tab to examine the extraction context behind each view.
How plotted estimates, pooled lines, and cautions are derived
- Each plotted point is the extracted validation performance metric for a specific validation. The classifier metrics are F1, sensitivity, and specificity.
- Metrics were extracted as the unweighted mean and standard deviation across behaviour classes when reported.
- When mean and SD were not reported but class-specific performance was reported, metrics were macro-averaged across classes to obtain an unweighted mean and SD.
- When a confusion matrix included validation counts for each behaviour class, metrics were computed and macro-averaged across classes to obtain an unweighted mean and SD.
- Extracted SDs were converted to sampling variances and divided by cross-validation folds when SDs reflected CV, or by number of behaviour classes when SDs reflected class variation. When SDs were unavailable, a lower-bounded behaviour-domain-specific median constant variance was used to stabilise weights.
- The constructed variance was used to build the displayed 95% confidence interval for each validation estimate.
- For meta-analysis, classifier metrics are analysed on the logit scale and back-transformed to proportions for display.
- The manuscript pooled line represents a participant-weighted meta-regression orientation for the selected subset, using the manuscript covariate set where available. The simple pooled line is a random-effects line with no covariates.
- The manuscript covariate pooled line uses the manuscript covariate set: centred number of behaviour classes, centred validation folds, missingness indicators, test environment, test protocol, and behaviour domain.
- Covariates that are constant or effectively unavailable in the filtered subset are dropped from that pooled-line calculation and listed under the plot.
- The simple random-effects line uses the validation table variances and includes no covariates.
- The dashed pooled line can mask heterogeneity across behaviours, protocols, devices, and populations.
- Missing metrics reflect reporting and extraction availability, not automatic exclusion by this app.
Scoping review landscape
Study population, countries, and device landscape
Publication timeline
Publication year on the x-axis, stacked by model complexityContinents and countries
Continents on the y-axis, country segments on each barDevice landscape
Wearable brands from Study_Characteristics.csv onlyParticipant information
Sample size, sex reporting, ethnicity reporting, and health statusPrediction model type landscape
Best-performing model sub-types from Study_Characteristics.csv, one entry per study rowStudy index
Transparency and risk of bias
Study-level quality and signalling questions
Domain-level risk
From Quality_Assessment.csvSignalling questions
Yes, No, Unclear, and NA response mixPer-study inspection
Quality assessment table
Filtered study rowsData workbench