Academic Foundation

Built on evidence,
not intuition

Every measurement, scoring method, and pedagogical design decision in ReasonDx is grounded in peer-reviewed research from medical education, cognitive psychology, and communication science.

NSU KPCAM Harvard Macy Institute 30+ primary citations

ReasonDx is built on four intersecting bodies of research: how clinicians reason diagnostically, how cognitive biases cause diagnostic errors, how clinical communication is taught and assessed, and how expert knowledge is structured. Each framework directly shapes a platform feature.

Diagnostic Reasoning
Hypothetico-Deductive Reasoning Model

Clinicians generate a small set of hypotheses early and selectively gather data to test them. ReasonDx's 10-phase structure mirrors this empirically-validated process.

Elstein, Shulman & Sprafka (1978)
Medical Problem Solving · Harvard University Press
Cognitive Bias
Dual Process Theory

System 1 (fast, intuitive) and System 2 (deliberate, analytical) thinking processes. ReasonDx requires explicit justification to activate System 2 and detect when students anchor prematurely on System 1 responses.

Croskerry (2009)
Academic Medicine, 84(8), 1022–1028
Clinical Communication
Calgary-Cambridge Guide & Kalamazoo Consensus

The gold-standard frameworks for medical interviewing and communication assessment, defining open-to-closed question sequencing, empathy behaviors, and the essential elements of every clinical encounter.

Silverman, Kurtz & Draper (2005); Makoul et al. (2001)
Academic Medicine, 76(4), 390–393
Knowledge Structure
Illness Script Theory & Semantic Qualifiers

Expert clinicians organize diagnostic knowledge into illness scripts (enabling conditions, fault, consequences). Richer, contrastive language (semantic qualifiers) predicts diagnostic accuracy. ReasonDx scores articulation quality using these markers.

Schmidt & Rikers (2007) · Med Educ, 41(12)
Bordage & Lemieux (1991) · Acad Med, 66(9)
Diagnostic Error
Cognitive Error Taxonomy

Cognitive factors contribute to 74% of diagnostic errors. Premature closure is the single most common cause. ReasonDx detects anchoring bias and premature closure using operationalizations grounded in this taxonomy.

Graber, Franklin & Gordon (2005)
Arch Intern Med, 165(13), 1493–1499
Health Literacy
Plain Language in Medicine

Provider communication complexity is a modifiable upstream cause of poor health outcomes. The CDC recommends patient materials at 6th–8th grade level. ReasonDx measures reading level of student-patient communication using validated formulas.

Paasche-Orlow & Wolf (2007) · Am J Health Behav
CDC (2016) · cdc.gov/healthliteracy
Self-Assessment
Confidence Calibration

Self-assessment is systematically inaccurate in medical trainees. Overconfidence is associated with diagnostic error. ReasonDx calibrates student-rated confidence against objective differential accuracy across sessions.

Eva & Regehr (2005) · Acad Med, 80(S10)
Berner & Graber (2008) · Am J Med, 121(5)
Learning Science
Spaced Repetition

Spaced practice produces superior retention compared to massed practice. ReasonDx generates spaced repetition review cards from each student's identified reasoning gaps — personalized to the specific concepts each student missed.

Kerfoot & Brotschi (2009) · Am J Surg, 197(1)
Wozniak (1990) · SM-2 Algorithm

ReasonDx passively collects behavioral data during normal simulation activity — no additional student effort required. Every construct measured maps to a validated theoretical framework.

Important: All behavioral signals are educational proxies for latent constructs, not clinical assessments. Longitudinal patterns require a minimum of three completed simulations before being reported to any learner.
Construct Domain Grounding
Differential breadth & accuracy
Phases 1, 4, 6–7
Reasoning Hypothetico-deductive model (Elstein et al., 1978); illness script theory (Schmidt & Rikers, 2007)
Anchoring bias
Cross-phase differential comparison
Reasoning Croskerry (2009); Graber et al. (2005); Kunitomo et al. (2022)
Premature closure
Post-labs differential change
Reasoning Graber et al. (2005) — single most common cognitive error; Berner & Graber (2008)
Evidence integration
Phase 4 → Final differential
Reasoning Croskerry (2009) debiasing framework — System 2 hypothesis updating
Reasoning articulation quality
0–4 scale per justification response
Reasoning Semantic qualifier framework (Bordage & Lemieux, 1991); elaborated knowledge (Bordage, 1994)
History-taking completeness
Critical history elicitation
Reasoning Faulty context generation (Graber et al., 2005 taxonomy)
Language complexity (readability)
Flesch-Kincaid, SMOG, ARI
Communication Kincaid et al. (1975); McLaughlin (1969); CDC 6th–8th grade benchmark
Language adaptation for patient
Patient vs. attending register
Communication SEGUE Framework (Makoul, 2001); Calgary-Cambridge Guide (Silverman et al., 2005)
Question type sequencing
Open/closed/leading/clarifying
Communication Calgary-Cambridge open-to-closed cone; Langewitz et al. (2002) spontaneous talking time
Empathy & rapport behaviors
7 behavioral signals per turn
Communication Kalamazoo Consensus Statement (Makoul et al., 2001); CARE Measure (Mercer et al., 2004)
Implicit confidence language
Hedging vs. commitment markers
Metacognition Epistemic markers in clinical reasoning (Lingard et al., 2003)
Confidence calibration
5-point Likert vs. accuracy
Metacognition Eva & Regehr (2005, 2008); Berner & Graber (2008); Sætrevik et al. (2024)
Guideline-grounded debrief
RAG over open-access guidelines
Evidence Retrieval-Augmented Generation (Lewis et al., 2020); open-access clinical guidelines only

ReasonDx is an educational platform, not a clinical assessment tool. We are committed to transparency about what the platform can and cannot measure.

Known limitations

  1. Text-based simulation fidelity. Real clinical encounters involve time pressure, physical examination, non-verbal communication, and emotional load that text-based simulation cannot fully replicate. Behavioral signals reflect in-simulation performance only.
  2. Proxy measures. All passive behavioral signals are proxies for latent constructs. They are best interpreted as educational signals, not clinical assessments or competency certifications.
  3. AI-generated patient. The patient voice is generated by a large language model (Claude, Anthropic). Responses are realistic but not drawn from real patient data.
  4. Minimum sessions required. Longitudinal cognitive pattern detection requires a minimum of three completed simulations per student. Single-session data is collected but no patterns are reported until this threshold is met.
  5. Self-selected sample. Students who choose to use ReasonDx may differ from those who do not in ways that affect results (motivation, prior clinical exposure, digital literacy).
  6. Open-access sources only. All clinical guidelines used in AI-generated debriefs are CC-BY, CC-BY-NC, or free-access society publications. The platform does not reproduce or paraphrase copyrighted clinical content.

The following are primary citations for the platform's core theoretical frameworks. Full evidence base documentation — including operationalizations, thresholds, and limitations for every construct — is maintained in the platform's research documentation.

Diagnostic Reasoning & Cognitive Bias

Croskerry, P. (2009). A universal model of diagnostic reasoning. Academic Medicine, 84(8), 1022–1028. doi:10.1097/ACM.0b013e3181ace703

Elstein, A. S., Shulman, L. S., & Sprafka, S. A. (1978). Medical Problem Solving: An Analysis of Clinical Reasoning. Harvard University Press.

Graber, M. L., Franklin, N., & Gordon, R. (2005). Diagnostic error in internal medicine. Archives of Internal Medicine, 165(13), 1493–1499. doi:10.1001/archinte.165.13.1493

Berner, E. S., & Graber, M. L. (2008). Overconfidence as a cause of diagnostic error in medicine. American Journal of Medicine, 121(5 Suppl), S2–S23.

Kunitomo, K., Harada, T., & Watari, T. (2022). Cognitive biases encountered by physicians in the emergency room. BMC Emergency Medicine, 22, 148. doi:10.1186/s12873-022-00708-3

Sætrevik, B., Seeligmann, V. T., Frotvedt, T. F., & Bondevik, Ø. K. (2024). Anchoring, confirmation and confidence bias among medical decision-makers. Collabra: Psychology, 10(1), 126223. doi:10.1525/collabra.126223

Knowledge Structure & Articulation

Schmidt, H. G., & Rikers, R. M. J. P. (2007). How expertise develops in medicine: knowledge encapsulation and illness script formation. Medical Education, 41(12), 1133–1139.

Bordage, G., & Lemieux, M. (1991). Semantic structures and diagnostic thinking of experts and novices. Academic Medicine, 66(9 Suppl), S70–S72.

Bordage, G. (1994). Elaborated knowledge: a key to successful diagnostic thinking. Academic Medicine, 69(11), 883–885.

Clinical Communication

Makoul, G. et al. (2001). Essential elements of communication in medical encounters: the Kalamazoo Consensus Statement. Academic Medicine, 76(4), 390–393.

Silverman, J., Kurtz, S., & Draper, J. (2005). Skills for Communicating with Patients (2nd ed.). Radcliffe Publishing.

Mercer, S. W. et al. (2004). The CARE Measure. Family Practice, 21(6), 699–705.

Levinson, W., Gorawara-Bhat, R., & Lamb, J. (2000). A study of patient clues and physician responses. JAMA, 284(8), 1021–1027.

Health Literacy & Readability

Paasche-Orlow, M. K., & Wolf, M. S. (2007). The causal pathways linking health literacy to health outcomes. American Journal of Health Behavior, 31(S1), S19–S26.

Kincaid, J. P. et al. (1975). Derivation of new readability formulas. Naval Air Station Memphis.

McLaughlin, G. H. (1969). SMOG grading: a new readability formula. Journal of Reading, 12(8), 639–646.

Centers for Disease Control and Prevention. (2016). Health Literacy. cdc.gov/healthliteracy

Self-Assessment & Confidence Calibration

Eva, K. W., & Regehr, G. (2005). Self-assessment in the health professions: a reformulation and research agenda. Academic Medicine, 80(10 Suppl), S46–S54.

Wolpaw, T. M., Wolpaw, D. R., & Papp, K. K. (2003). SNAPPS: a learner-centered model for outpatient education. Academic Medicine, 78(9), 893–898.

Moulton, C. A. et al. (2007). Slowing down when you should: a new model of expert judgment. Academic Medicine, 82(S10), S109–S116.