Visualizing Large Sets Of Medical Education Content To Identify Gaps and Redundancies

May 22, 2018 8:00 AM – 9:15 AM

Pedro Teixeira, Vanderbilt University
Scott Drake, ScholarRx
Tao Le, ScholarRx

The size of medical education and curriculum content balloons over time as documents, assessment items, and references are added to content repositories like LMSs. While this provides a wealth of information for students and faculty, it also becomes difficult to navigate and explore. For curriculum planners, quickly visualizing large bodies of content is challenge for curriculum gap and redundancy analysis. To adequately represent the semantic content of a piece of text quantitatively one requires a high dimensional vector, but high dimensional vectors are difficult to represent in the 2- and 3-dimensional spaces humans can perceive.

In this presentation, first we will review the process for converting text-based content into high dimensional vectors. We will then visualize this information for exploration using widely available, open source machine learning tools.

TensorFlow is an open source library of tools from Google that simplifies large scale machine learning across different computer environments. Using TensorFlow and the included Embedding Projector we visualize 50-100 dimensional vectors calculated from ScholarRx text content. We have applied this to 12,751 short Q&A flashcards. We use the t-Distributed Stochastic Neighbor Embedding (t-SNE) algorithm for dimensionality reduction while preserving proximity between semantically similar content items. One can quickly navigate and search clusters of related content for curriculum analysis. In the presentation, we will also demonstrate these visualization methods with other types of curricular content including PDFs, PowerPoints, multiple choice questions, etc., and discuss the potential applications in curriculum analysis.

RESULTS: We will review the outcome of the various solutions used, how faculty and students interacted with these technologies and share preliminary data on the effectiveness of this novel approach.