Research Foundations

Theoretical Frameworks

ReasonDx is built on four intersecting bodies of research: how clinicians reason diagnostically, how cognitive biases cause diagnostic errors, how clinical communication is taught and assessed, and how expert knowledge is structured. Each framework directly shapes a platform feature.

Diagnostic Reasoning

Hypothetico-Deductive Reasoning Model

Clinicians generate a small set of hypotheses early and selectively gather data to test them. ReasonDx's 10-phase structure mirrors this empirically-validated process.

Elstein, Shulman & Sprafka (1978)
Medical Problem Solving · Harvard University Press

Cognitive Bias

Dual Process Theory

System 1 (fast, intuitive) and System 2 (deliberate, analytical) thinking processes. ReasonDx requires explicit justification to activate System 2 and detect when students anchor prematurely on System 1 responses.

Croskerry (2009)
Academic Medicine, 84(8), 1022–1028

Clinical Communication

Calgary-Cambridge Guide & Kalamazoo Consensus

The gold-standard frameworks for medical interviewing and communication assessment, defining open-to-closed question sequencing, empathy behaviors, and the essential elements of every clinical encounter.

Silverman, Kurtz & Draper (2005); Makoul et al. (2001)
Academic Medicine, 76(4), 390–393

Knowledge Structure

Illness Script Theory & Semantic Qualifiers

Expert clinicians organize diagnostic knowledge into illness scripts (enabling conditions, fault, consequences). Richer, contrastive language (semantic qualifiers) predicts diagnostic accuracy. ReasonDx scores articulation quality using these markers.

Schmidt & Rikers (2007) · Med Educ, 41(12)
Bordage & Lemieux (1991) · Acad Med, 66(9)

Diagnostic Error

Cognitive Error Taxonomy

Cognitive factors contribute to 74% of diagnostic errors. Premature closure is the single most common cause. ReasonDx detects anchoring bias and premature closure using operationalizations grounded in this taxonomy.

Graber, Franklin & Gordon (2005)
Arch Intern Med, 165(13), 1493–1499

Health Literacy

Plain Language in Medicine

Provider communication complexity is a modifiable upstream cause of poor health outcomes. The CDC recommends patient materials at 6th–8th grade level. ReasonDx measures reading level of student-patient communication using validated formulas.

Paasche-Orlow & Wolf (2007) · Am J Health Behav
CDC (2016) · cdc.gov/healthliteracy

Self-Assessment

Confidence Calibration

Self-assessment is systematically inaccurate in medical trainees. Overconfidence is associated with diagnostic error. ReasonDx calibrates student-rated confidence against objective differential accuracy across sessions.

Eva & Regehr (2005) · Acad Med, 80(S10)
Berner & Graber (2008) · Am J Med, 121(5)

Learning Science

Spaced Repetition

Spaced practice produces superior retention compared to massed practice. ReasonDx generates spaced repetition review cards from each student's identified reasoning gaps — personalized to the specific concepts each student missed.

Kerfoot & Brotschi (2009) · Am J Surg, 197(1)
Wozniak (1990) · SM-2 Algorithm

Constructs Measured

ReasonDx passively collects behavioral data during normal simulation activity — no additional student effort required. Every construct measured maps to a validated theoretical framework.

Important: All behavioral signals are educational proxies for latent constructs, not clinical assessments. Longitudinal patterns require a minimum of three completed simulations before being reported to any learner.

Construct	Domain	Grounding
Differential breadth & accuracy Phases 1, 4, 6–7	Reasoning	Hypothetico-deductive model (Elstein et al., 1978); illness script theory (Schmidt & Rikers, 2007)
Anchoring bias Cross-phase differential comparison	Reasoning	Croskerry (2009); Graber et al. (2005); Kunitomo et al. (2022)
Premature closure Post-labs differential change	Reasoning	Graber et al. (2005) — single most common cognitive error; Berner & Graber (2008)
Evidence integration Phase 4 → Final differential	Reasoning	Croskerry (2009) debiasing framework — System 2 hypothesis updating
Reasoning articulation quality 0–4 scale per justification response	Reasoning	Semantic qualifier framework (Bordage & Lemieux, 1991); elaborated knowledge (Bordage, 1994)
History-taking completeness Critical history elicitation	Reasoning	Faulty context generation (Graber et al., 2005 taxonomy)
Language complexity (readability) Flesch-Kincaid, SMOG, ARI	Communication	Kincaid et al. (1975); McLaughlin (1969); CDC 6th–8th grade benchmark
Language adaptation for patient Patient vs. attending register	Communication	SEGUE Framework (Makoul, 2001); Calgary-Cambridge Guide (Silverman et al., 2005)
Question type sequencing Open/closed/leading/clarifying	Communication	Calgary-Cambridge open-to-closed cone; Langewitz et al. (2002) spontaneous talking time
Empathy & rapport behaviors 7 behavioral signals per turn	Communication	Kalamazoo Consensus Statement (Makoul et al., 2001); CARE Measure (Mercer et al., 2004)
Implicit confidence language Hedging vs. commitment markers	Metacognition	Epistemic markers in clinical reasoning (Lingard et al., 2003)
Confidence calibration 5-point Likert vs. accuracy	Metacognition	Eva & Regehr (2005, 2008); Berner & Graber (2008); Sætrevik et al. (2024)
Guideline-grounded debrief RAG over open-access guidelines	Evidence	Retrieval-Augmented Generation (Lewis et al., 2020); open-access clinical guidelines only

Transparency & Limitations

ReasonDx is an educational platform, not a clinical assessment tool. We are committed to transparency about what the platform can and cannot measure.

Known limitations

Text-based simulation fidelity. Real clinical encounters involve time pressure, physical examination, non-verbal communication, and emotional load that text-based simulation cannot fully replicate. Behavioral signals reflect in-simulation performance only.
Proxy measures. All passive behavioral signals are proxies for latent constructs. They are best interpreted as educational signals, not clinical assessments or competency certifications.
AI-generated patient. The patient voice is generated by a large language model (Claude, Anthropic). Responses are realistic but not drawn from real patient data.
Minimum sessions required. Longitudinal cognitive pattern detection requires a minimum of three completed simulations per student. Single-session data is collected but no patterns are reported until this threshold is met.
Self-selected sample. Students who choose to use ReasonDx may differ from those who do not in ways that affect results (motivation, prior clinical exposure, digital literacy).
Open-access sources only. All clinical guidelines used in AI-generated debriefs are CC-BY, CC-BY-NC, or free-access society publications. The platform does not reproduce or paraphrase copyrighted clinical content.

Selected References

The following are primary citations for the platform's core theoretical frameworks. Full evidence base documentation — including operationalizations, thresholds, and limitations for every construct — is maintained in the platform's research documentation.

Diagnostic Reasoning & Cognitive Bias

Croskerry, P. (2009). A universal model of diagnostic reasoning. Academic Medicine, 84(8), 1022–1028. doi:10.1097/ACM.0b013e3181ace703

Elstein, A. S., Shulman, L. S., & Sprafka, S. A. (1978). Medical Problem Solving: An Analysis of Clinical Reasoning. Harvard University Press.

Graber, M. L., Franklin, N., & Gordon, R. (2005). Diagnostic error in internal medicine. Archives of Internal Medicine, 165(13), 1493–1499. doi:10.1001/archinte.165.13.1493

Berner, E. S., & Graber, M. L. (2008). Overconfidence as a cause of diagnostic error in medicine. American Journal of Medicine, 121(5 Suppl), S2–S23.

Kunitomo, K., Harada, T., & Watari, T. (2022). Cognitive biases encountered by physicians in the emergency room. BMC Emergency Medicine, 22, 148. doi:10.1186/s12873-022-00708-3

Sætrevik, B., Seeligmann, V. T., Frotvedt, T. F., & Bondevik, Ø. K. (2024). Anchoring, confirmation and confidence bias among medical decision-makers. Collabra: Psychology, 10(1), 126223. doi:10.1525/collabra.126223

Knowledge Structure & Articulation

Schmidt, H. G., & Rikers, R. M. J. P. (2007). How expertise develops in medicine: knowledge encapsulation and illness script formation. Medical Education, 41(12), 1133–1139.

Bordage, G., & Lemieux, M. (1991). Semantic structures and diagnostic thinking of experts and novices. Academic Medicine, 66(9 Suppl), S70–S72.

Bordage, G. (1994). Elaborated knowledge: a key to successful diagnostic thinking. Academic Medicine, 69(11), 883–885.

Clinical Communication

Makoul, G. et al. (2001). Essential elements of communication in medical encounters: the Kalamazoo Consensus Statement. Academic Medicine, 76(4), 390–393.

Silverman, J., Kurtz, S., & Draper, J. (2005). Skills for Communicating with Patients (2nd ed.). Radcliffe Publishing.

Mercer, S. W. et al. (2004). The CARE Measure. Family Practice, 21(6), 699–705.

Levinson, W., Gorawara-Bhat, R., & Lamb, J. (2000). A study of patient clues and physician responses. JAMA, 284(8), 1021–1027.

Health Literacy & Readability

Paasche-Orlow, M. K., & Wolf, M. S. (2007). The causal pathways linking health literacy to health outcomes. American Journal of Health Behavior, 31(S1), S19–S26.

Kincaid, J. P. et al. (1975). Derivation of new readability formulas. Naval Air Station Memphis.

McLaughlin, G. H. (1969). SMOG grading: a new readability formula. Journal of Reading, 12(8), 639–646.

Centers for Disease Control and Prevention. (2016). Health Literacy. cdc.gov/healthliteracy

Self-Assessment & Confidence Calibration

Eva, K. W., & Regehr, G. (2005). Self-assessment in the health professions: a reformulation and research agenda. Academic Medicine, 80(10 Suppl), S46–S54.

Wolpaw, T. M., Wolpaw, D. R., & Papp, K. K. (2003). SNAPPS: a learner-centered model for outpatient education. Academic Medicine, 78(9), 893–898.

Moulton, C. A. et al. (2007). Slowing down when you should: a new model of expert judgment. Academic Medicine, 82(S10), S109–S116.

Built on evidence,
not intuition

Known limitations

Every case runs on this framework

Built on evidence,not intuition

Known limitations

Every case runs on this framework

Built on evidence,
not intuition