Anyone can turn on a audio recorder and ask questions into thin air. Anyone can listen to that recording and think they are getting responses from ghosts and spirits. Driven by their belief in ghostly phenomenon, they are quite capable of making something mysterious out of completely explainable audio samples.
The main problem here is the lack of knowledge, confirmation bias and poor analysis techniques. Together these elements comprise the part of ghost hunting known as Electronic Voice Phenomenon or EVP.
However, properly analyzing that audio to determine if there is really anything unusual is another matter. That is because you do not use your ears to do the analysis, you use your eyes.
I’ll explain but I have to define a few terms first.
When it comes to the analysis of voices, scientists have already been there. It is a science called linguistics. Linguistics is the scientific study of language. Many topics fall under this umbrella. For this article I’m going to be using Phonetics.
Phonetics is a branch of linguistics that comprises the study of the sounds of human speech and one of the predominant ways this is accomplished is through the analysis of spectrograms.
A spectrogram is a visual representation of the spectrum of frequencies in a sound or other signal as they vary with time or some other variable. Spectrograms are sometimes called spectral waterfalls, voiceprints, or voicegrams. I’m not going to go into allot of detail on how this is done in this article. If you’re curious follow the link below.
A linguist looks at the spectrogram to identify something called formants. A formant is each of several prominent bands of frequency that determine the phonetic quality of a vowel.
To identify a vowel in a spectrogram, linguists use the formants F1 and F2. In the vowels, F1 can vary from 300 Hz to 1000 Hz. The lower it is, the closer the tongue is to the roof of the mouth. The vowel /i:/ as in the word ‘beet’ has one of the lowest F1 values – about 300 Hz; in contrast, the vowel /A/ as in the word ‘bought’ has the highest F1 value – about 950 Hz.
F2 can vary from 850 Hz to 2500 Hz; the F2 value is proportional to the frontness or backness of the highest part of the tongue during the production of the vowel.
So to properly analyze an audio file to determine what is being said, I’m going to look at the spectrogram of the audio and read the F1 and F2 formants. I’m using my eyes for the analysis, not my ears.
There is even a “monthly mystery spectrogram” webpage where viewer try to figure out what is being said by looking at the spectrogram.
Hold tight, there is one more scientific principle I have to explain before I get to my latest rant. It is something called signal to noise ratio. Signal-to-noise ratio (SNR) is the measurement used to describe how much desired sound is present in an audio recording, as opposed to unwanted sound (noise). This nonessential input could be anything from electronic static from your recording equipment, or external sounds from the noisy world around us, such as the rumble of traffic, or the murmur of voices in the background.
This is where the problem begins with recording EVP. To demonstrate I conducted my own little test to show how the signal to noise ratio affects the spectrogram.
The phrase “The quick brown fox jumped over the lazy dog” was spoken next the microphone. The same phrase was repeated at 10 foot intervals up to the final distance of 60 feet from the recorder. To keep the amplitude constant I used a “boom box” with the sound level turned up half way (to replicate the average decimal range of a speaking voice).
The Spectrogram above was recorded with the speaker next to the recorder. All looks good. the formants are clearly visible as well as the higher frequencies. A linguist would have no problem figuring out what I was saying.Now let’s start messing with the signal to noise ratio. To weaken the signal I’m going to move back 10 feet. This is what the spectrogram looks like now.
The structure of the formants has faded but it is still readable. Lets decrease the signal again by moving back yet another 10 feet.
At this distance the higher frequencies are starting to vanish. Look at the difference with the word “jumped”. Only the lower and mid range frequencies remain. So let’s move the boom box back yet another 10 feet. The source of the sound is now 30 feet away.
Well this sucks. Huge parts of the speech are being lost. There is more noise than signal, so parts of my voice are being lost because the high and midrange frequencies do not travel as far as the lower ones.
The spectrogram with the speaker 40′ away from the recorder. Things are looking worse. The lower frequencies are still visible because they propagate further. When speech is corrupted by stationary noise, it creates missing features in the spectrogram. But wait, there’s more!
This is the spectrogram of the recording with the speaker 50′ away from the recorder. Only a few formants remain but the ability to decipher the words is practically impossible. Too much information has been lost.
The speaker is now 60′ away from the recorder. Only two small segments remain. Now if you listen to the audio file recorded at this distance you can still hear my voice. Weak and slightly distorted you call tell that it is saying something but it is hard to determine exactly what.
So hopefully I have given you a basic understanding of phonics, formats and signal to noise ratio. I hope you enjoyed those appetizers because we are now ready for the main course.
So one day I’m browsing through YouTube when I stumbled across this video. Sorry for the link but embedding was disabled for this video.
After watching this the first thing that entered my mind was “Who are these “experts” that confirmed this was EVP? Fortunately the Youtuber mentioned this in the video’s description. I have provided it below;
UPDATE 05/13/10 – It’s been a few months since I’ve uploaded this and have gotten a few messages about it. Well, now I’ve FINALLY got a reply from one of the experts that I have forwarded the evp to. Here is his reply:
Thank you for submitting your evp to the Paranormal Task Force. I am sorry that it took so long for us to get back to you but things are extremely hectic.
I was able to listen to the evp that you submitted and in my opinion it is an evp. This is based on a couple of things. First, the Hz range is around 200 Hz. A healthy human ear can hear speech at about 500Hz so this is well below that point.
Second, there is a lot of noise associated with the voice when it is talking. I could not clean this up as I
could if it were simple hissing and popping from the recording device or if it were background noise. When I attempted to eliminate (with hissing and popping filters) I was also eliminating the voice along with it. The school of thought here is that the energy a spirit uses can cause it to sound like an AM radio station with the static.
In my opinion I would classify this as a Class “C” evp based on the amount of software used to clean up the recording to make for easier listening. Also it is difficult to tell for sure what us being said. I think that there are three words being spoken. The first word is “I”, the second possibly “spy” and the third sounds like a two syllable word that I can’t make out.
Again thank you for submitting this to PTF and it was our pleasure to help you.
So let’s start with this statement:
“I was able to listen to the evp that you submitted and in my opinion it is an evp. This is based on a couple of things. First, the Hz range is around 200 Hz. A healthy human ear can hear speech at about 500Hz so this is well below that point.”
With that statement every bit of this team’s creditability went out the window.
First of all, that information is wrong. The range of human hearing is generally considered to be 20 Hz to 20 kHz but the truly disturbing thing is that he is basing his results on the fact that he thinks that the “voice” is in the 200Hz range. This is a popular bit of pseudoscience in the ghost hunting community that is based on a lack of understanding in several elements of acoustics and sound recording.
Since, in the “expert’s” opinion, the “voice” that falls below the threshold of hearing, this must be the voice of a ghost and clearly shows two things.
First he has no knowledge on the actual range of the human voice, typically 64Hz (male bass) to 2050Hz (2.05 kHz) for (female soprano).
Secondly our “expert” does not know how to read a spectral view of the recording and also has no knowledge of signal to noise ratio. This is where his error is. Let’s look at the spectral view of the recording.
You can clearly see the brighter segments at the bottom of the spectral view. These are the lower frequencies that have propagated well. Since they are the prominently visible, the paranormal researcher thinks that the recording is in the 200Hz range. However, the midrange frequencies are still there (circled) but they have been corrupted by stationary noise, creating missing features in the spectral view. This is what I was replicating in my experiment with the boom box. The lower frequencies propagate while the mid-range and higher frequencies fade and eventually vanish.
When this audio is opened with phonics analysis software (PRATT) you can clearly see the frequency range in this recording is around 1901 Hz.
Ok, there is something there but is it a voice? Let’s look for formants so we can determine what the vowels are. After all, the person who recorded the audio and the paranormal researcher that analyzed it have different views on what is being said. It would be quite useful if we could identify some vowels to determine who is correct.
Oh snap! Error message: No formant contour is visible. That, ladies and gentlemen, means that this audio is not a voice. It is just noise. However our paranormal researcher shouldn’t feel so bad. He was actually looking at the cause of his own misperception, those bright bands of lower frequencies around 200Hz.
Those frequencies are in the ranges of the F1 and F2 formants. Your mind hears these and thinks that you are hearing a vowel. It then attempts to “fill in” the missing data. The end result is that you hear a voice where there actually is none. Different people hearing the same audio file will often hear different words.
The name for this phenomenon is pareidolia.
I believe that the sound recorded here are is a car or motorcycle accelerating from a stop light (sign) outside of the building. The signal loss and frequency range really seem to suggest it. Take a listen to only the low frequencies our paranormal investigator discovered.
The ghost hunter’s message to his client brought out another major problem that is heavily endorsed by the paranormal community. It is their concept of audio analysis. Here is a classic example from the YouTube’s video description.
“Second, there is a lot of noise associated with the voice when it is talking. I could not clean this up as I
could if it were simple hissing and popping from the recording device or if it were background noise. When I attempted to eliminate (with hissing and popping filters) I was also eliminating the voice along with it.”
These should have been obvious clues that he was simply looking at noise. When ghost hunters come across low frequency audio bits, their first reaction is to clean up the audio so that the “voice” can be better understood. This can be a huge mistake.
The techniques used by ghost hunters and other paranormal enthusiasts compound the problem by applying noise reduction and other filters in an attempt to hear the voice more clearly. Noise reduction is not a panacea; the more noise there is, the harder it is to remove without affecting non-noise components. This process can destroy essential elements of the speech which increases the probability of pareidolia when attempting to identify words which you were unable to do in the first place. Most importantly though, that IS NOT how you analyze a voice!
So for you paranormal investigators out there that are doing this kind of crap stop pretending to be something that you are not.