I thought I’d get back to the accuracy question again, and go into a bit more detail on how we determine the overall accuracy of a phonetic search model based on the optimum trade-off between precision and recall. It all boils to understanding the DET chart, or Detection Error Tradeoff. Here’s what one of these looks like:
Image may be NSFW.
Clik here to view.
In most charts, “up and to the right” is the way you want to go. A DET is somewhat flipped from this paradigm, where “down and to the left” would represent a perfect world. But as we’ve discussed before, there’s no perfect world in search…it’s all about trade-offs. So let’s dive into the details on this chart so you can understand it better. First, what does this chart really show?
This chart shows the practical search results for five different search expressions in a typical Nexidia search. Each search expression is made up of a certain number of phonemes. The shortest expression (fewest phonemes) is shown at the top in the orange line, while the longest expression (most phonemes) is shown in pink at the bottom. The Y-axis measures the percent recall for the search, while the X-axis measures the level of precision for the search. (For a refresher on precision vs. recall, view my earlier post here.)
So what is this chart showing us? It is a dramatic and real interpretation that for any given search expression, you can maximize recall (most potential true positives) but only at the expense of precision (more false hits). The yellow line represents a search term with 8 phonemes, a typical two-to-three syllable word. Following this line all the way down to the right, you see that you can achieve almost 90 percent recall if you are willing to live with about 10 false alarms per hour of content. That’s not a bad trade-off in a compliance situation, especially when the review tool lets you quickly and easily listen to and disposition results.
As with most any type of search engine you use, the more relevant content you give it to search, the better your results. So in this case, the bottom pink line represents a search of 20 phonemes (a typical three or four word phrase) and shows that you can get over 95% recall with just one false alarm per hour, and almost 99% recall with only 10 false alarms per hour.
There are two key points that I will make again. First, because the underlying phonetic index has captured ALL the true spoken content in each recording, it offers the most accurate representation possible of what people have actually said in the file. But second, due to the many variables that make up the differences we experience in human speech (accents, background noise, etc.), reviewers can leverage this knowledge about precision vs. recall to craft a search strategy that gives them the level of search results that satisfy their goals.