We get many people asking why we still transcribe manually, using (and paying) highly trained transcribers rather than just purchasing some voice recognition software that could convert speech to text in the same way that an iPhone can produce a dictated text message.
Manual Transcription – a throwback to the Stone Age?
We always try to explain that whilst voice recognition software has come on in leaps and bounds in recent years and may work well for basic dictation, it is still nowhere near good enough for serious projects where there are more than one speaker and the interview environment is not perfect. We haven’t tested automated transcription software for a while so we thought we should make sure that what we are telling people is still correct!
Automated Transcription – The Test
We have two interviews that we use for testing transcribers who apply for a role with us. Both are two speaker interviews with no strong accents.
One recording is crystal clear with no background noise, the other has a small amount of background noise from start to finish but the speech can still be heard quite easily above this noise.
The clear interview is very straightforward to transcribe and we expect all applicants to achieve 99-100% accuracy with this. The interview with the background noise is nothing terribly difficult but does require more concentration to hear the words but all words can be heard clearly when listened to carefully.
Manual Transcription – 99-100% Accuracy
Our experienced transcribers can transcribe this interview with 100% accuracy. We expect new applicants to transcribe this interview with a minimum of 99% accuracy if they are to be offered a role. We feel this second interview is a good test for applicants because it replicates well the average interview that we receive from clients.
Software – NCH Express Scribe & Temi
For our test we decided to have both interviews transcribed using the most common home automated transcription software (NCH) and to upload both interviews to be transcribed by one of the most popular and highest rated online automated transcription services, Temi.
Temi is a speech recognition company who say the following about themselves: “Temi is changing how people extract value out of their digital files. With the explosion of personal and online media, we believe there is tremendous value in this content, just waiting to be unlocked. We started building a better speech recognition service combined with user-friendly tools.”
Express Scribe uses the SAPI speech-to-text engine that is usually already installed on your computer if you are running Windows. If not, then Dragon Dictate or similar will be required.
Automated Transcription Results
Interview 1 – Clear and Crisp
When we put the clear interview through the NCH Express Scribe automated speech to text the software got 40 words correct out of 598, which is an accuracy level of just under 7%. We were genuinely surprised at how poor the performance was from the NCH software as it produced a completely unusable document and that was with a crystal clear interview.
NCH Clear Recording TranscriptNCH-Clear-Interview
When we uploaded the same interview with Temi they produced a transcription that had 466 words correct, which is just under 78% accurate.
Temi Clear Recording TranscriptTemi-Clear-Interview
Whilst being way below our manual transcription levels of 99% + accuracy for this interview, the Temi document was much better than the NCH results and could feasibly be used as a starting point for creating a useful transcription. It is worth bearing in mind though that the Temi transcription was just one block of text so as well as having to make the corrections, it would need to be “knocked into shape” in terms of formatting and attributing text to particular speakers. This would clearly become even more problematic with any additional speakers.
Interview 2 – Background Noise
When we put the second interview with the clear speech but with background noise though both tests we got the following results.
NCH software got 7 words correct out of 598 which is an accuracy level of marginally over 1%. Temi only got 25 words correct on this second interview which is an accuracy level of just above 4% and once more, just in one solid block of text.
NCH Hard to Hear TranscriptNCH-Hard-to-Hear-Interview
Temi Hard to Hear TranscriptTemi-Hard-to-Hear-Interview
We knew from the first test that NCH would not be able to transcribe this interview but we were genuinely very surprised at how badly Temi performed. Temi do say on their website that the service is unsuitable for recordings with background noise.
As mentioned earlier, this interview is more challenging due to the background noise but every single word is audible above that background noise and accuracy levels of 98% and above are easily achievable by a competent transcriber.
Summary – Does Automated Transcription Work?
In conclusion. The transcription software that you can use at home appears to still be all but useless for most serious transcription work. NCH Express Scribe is not fit for purpose. Temi was much better.
However the second interview we tried is very important to us when testing applicants for transcriber roles because although most of the work we receive has reasonably clear speech, it invariably has some form of background noise. Not many of the interviews we receive are carried out in a silent environment. This means that Temi would be of little or no use for the vast majority of work that comes into us.
Temi is certainly useful for some transcription work, particularly single speaker dictation in a nice quiet environment. It is also possibly useful for the occasional short interview if carried out in a very quiet environment. However, given that in addition to having to carefully listen through the entire audio file to find and correct mistakes, one would also need to sort out the format of the document and attribute text to speakers, it is difficult to see that this would be particularly useful for anything but a one off interview.
AI transcription would certainly seem far too time consuming for anyone who has a number of interviews that need transcribing or who uses transcription services with any kind of regularity.