Just a week ago, I wrote a blog about Robocalls, and how they can become a potent variant as a Cyberthreat. At the present time, they are more of annoyance than a real risk. But in this piece, I also forgot to mention another type of call related attack, which makes use of the principles of Social Engineering.
This one is called “V-Phishing”, is also known as “Vishing”. In this, case, rather than sending out Emails, the Cyberattacker calls you instead on the phone, attempting to con you out of giving away your Personal Identifiable Information (PII).
But, also as mentioned, as the American public is becoming much more aware and attuned to Phishing based Emails, the Cyberattacker, is now shifting their strategies to use voice-based scams instead. In fact, this kind of Cyberattack even has its own technical term, and it is called “Voice Fraud”. Imagine that? I have never heard of that until this morning as I started to write this blog.
This is no laughing matter, and it should be taken very seriously. As far as I know, there have not been too many headlines about it that I have come across, but there is strong potential that this could make waves starting next year. This is according to the latest market intelligence report that was conducted by Pindrop. Their project is entitled “Voice Intelligence Report”, and it can be downloaded at this link:
Here are some of the key findings, and they really are quite alarming:
*There are 90+ voice fraud attempts that occur every minute here in the United States;
*Voice fraud has increased by 350% from 2014 to 2018;
*Insurance claim voice fraud has increased by 248%, with average payout being well over $500,000;
*In 2018, there were over 446 million PII records that were hijacked has a result of voice fraud;
*The Top industries that are most at risk for voice fraud:
*Financial (primarily that of Banking and Credit Unions);
*Credit card issuers;
There are two variants of Voice Fraud that are expected to be further exploited into 2020, and they are as follows:
This can be defined specifically as follows:
“Deepfakes refer to manipulated videos, or other digital representations produced by sophisticated artificial intelligence, that yield fabricated images and sounds that appear to be real.”
In other words, imagine that you are watching a TV program with your favorite actor or actress in it. Through the use of Artificial Intelligence (AI) and very sophisticated video manipulation, a computer-based mockup of this person can be created that looks like the real thing. The term “Deepfake” is actually a combination of two other words, which are “Deep Learning” and “Fake”. This is actually a subset of AI. In other words, the AI system can literally examine photos and videos of the target individual, and over a short period of time, create an almost exact replica of that person. While this does have its entertainment value, it also has its grave security risks as well. This actually took place back in the 2016 Presidential Elections, when Deepfakes of both Trump and Clinton. But since then, AI has rapidly advanced, and it is very much feared that it will be used to a great extent in the 2020 Presidential Elections. One of the biggest risks is that a Cyberattacker could create various Deepfakes the of the two leading candidates and send out videos over Social Media asking for monetary donations. But don’t ever fall for this, as not only will your money be sent to a fraudulent bank account, but your other financial information will be at grave risk as well. Of course, can also be used for other forms of Phishing campaigns as well. It is also important to note that Deepfakes make use of a very sophisticated algorithm known as “Generative Adversarial Networks”, or “GANs” for short. This kind of algorithm has been designed that look for flaws and weaknesses in the current methods of forgery and tries to overcome them.
2) Synthetic Voices:
This can be defined as follows:
“A synthetic voice is an artificially produced version of human speech. Speech synthesis is just another form of information output where a computer reads words to you out loud in a real or simulated voice, played through the device’s speaker; this is often called text-to-speech (TTS).”
Simply put, this where a Cyberattacker tries to mimic the voice of a real individual via the means of the same type of AI algorithms that are used to create the Deepfakes. As this new threat variant evolves over time, it is quite likely that it will be used in launching large scale Vishing Attacks, to unprecedented levels never seen before. What is even scarier about this form of technology is that they can recreate very realistic sounding voice samples in just a matter of seconds. In fact, according to this report, 24,000 samples can be created in just one second. Wherever there is a sound source, a Synthetic Voice can be created. But, so far it seems like that this threat vector has not emerged yet into the forefront – but it will over time. Many Cybersecurity experts believe that it is still in the prototyping phase, and that soon the Cyberattacker will test the voice samples that they have created by contacting call centers, in order to determine if they security protocols can detect the fake voices or not.
My Thoughts On This
The report from Pindrop also elaborates onto the birth of a sort of new revolution, which has been coined the “Conversational Economy”. This is where all of the things that we interact with on a daily basis will be done by our voice. For example, our Smartphones, the mobile apps, our cars, and just about every other “smart” based that we use will be tied to our linguistics.
Part of what is fueling this craze is the Internet of Things (IoT), and the advancements that are being made with the Virtual Personal Assistants (VPAs) such as those of Alexa, Siri, and Cortana).
But this is a double-edged sword. For example, the companies that are the early adopters of this new revolution will of course be the ones to gain the most, in terms of both revenue and new customers.
But at the same time, with all of this interconnectedness, the attack surface for the Cyberattacker increases that much more. At the present time, there is no way to secure or even encrypt voice-based transactions, and the vendors that are coming out with new products in this aspect put this is a last priority.
To me, this is all very scary. I am not a fan at all of this connectivity, and I will personally never adopt it. I know people that have tried to make their dwelling into the so called “Smart Home”, but they too are now starting to see the security pitfalls that are associated with it. When examining the above two threat variants, I am actually much more fearful of the Deepfakes.
This has come to the point where it is almost close to impossible to tell if the person you are seeing in a video is real or not. There are subtle clues to this, but it takes a trained eye in order to do this. If you take notice around the eyes, the lips, the eyebrows, and even the nose of the person that is in the video, there are some fuzzy and distorted areas around these particular features that will give them away as being a fake.
But there is a counter to this – known as “Voice Recognition”. This is a Biometric Technology that can confirm the identity of an individual based upon the unique characteristics in their voice, such as the inflections, tempo, enunciations, etc.
So far, this modality has not evolved on a large scale yet, but it could very well in a short period of time given the birth of the Conversational Economy and all of the security pitfalls that it brings along with it.