Search Articles

View query in Help articles search

Search Results (1 to 10 of 1666 Results)

Download search results: CSV END BibTex RIS


Performance of Large Language Models in Numerical Versus Semantic Medical Knowledge: Cross-Sectional Benchmarking Study on Evidence-Based Questions and Answers

Performance of Large Language Models in Numerical Versus Semantic Medical Knowledge: Cross-Sectional Benchmarking Study on Evidence-Based Questions and Answers

However, for questions in which one does not know the answer, an “I do not know” (IDK) option was added to all questions as a possible answer. Each Q and A is based on numerical data derived from Kahun’s knowledge graph, including minimum, maximum, and midvalues, estimating the connections between medical entities. We used statistical methods, including median and median absolute deviation (MAD), to categorize answers into meaningful ranges based on their calculated midvalues.

Eden Avnat, Michal Levy, Daniel Herstain, Elia Yanko, Daniel Ben Joya, Michal Tzuchman Katz, Dafna Eshel, Sahar Laros, Yael Dagan, Shahar Barami, Joseph Mermelstein, Shahar Ovadia, Noam Shomron, Varda Shalev, Raja-Elie E Abdulnour

J Med Internet Res 2025;27:e64452