Voicemod tools up with $14.5M to ride the generative AI (sonic)boom

The very first thing we ask Voicemod‘s CEO and co-founder, Jamie Bosch, when he picks up the cellphone to discuss a brand new funding spherical shouldn’t be one thing we’re accustomed to asking — however our query could develop into the norm in the generative AI future that’s fast-flying at us: Is that this your actual voice?

Bosch’s startup has been fiddling with audio results for nearly a decade, taking part in in the discipline of digital sign processing (DSP) — the place its early focus was on creating enjoyable ‘sound emoji’ results and reactions for avid gamers to spice up their voice chats. And avid gamers do stay its most important user-base (for now). However the audio discipline is being charged by developments in AI — which Voicemod’s crew is hoping will lead to entire new use-cases and plenty of extra customers for its tools.

So the place DSP expertise was about making use of results to an individual’s (actual) voice, developments in synthetic intelligence are enabling startups like Voicemod to supply tools to create fully synthesized (unreal) voices. And even the potential for customers to ‘put on’ these voices in real-time — to allow them to converse with a voice that isn’t theirs. Consider it as the audio equal of a Snapchat lens or TikTok’s viral teenage filter or Reface’s movie star face-swaps.

AI voice may even allow voice-shifting into one other particular person’s (actual) voice. And never only for speaking about the climate or taking pictures the shit. However for what’s often called sing-to-sing voice conversion. That means you could possibly get to sing in another person’s voice — supercharging your karaoke sport, say, by singing Bohemian Rhapsody as actually the voice of Freddie Mercury. And even switching between Mercury, Might and Taylor, for the full mock opera impact when you have sufficient skilled AI fashions (and microphones) available. Mamma-mia! 


Synthetic intelligence makes all this attainable — even when authorized and moral questions could create pause for thought of speeding to unleash real-time voice-shifting upon a world that also depends lots upon mounted identities. (Banks pushing clients to file ‘a singular voiceprint’ to use as a password positively want to sit tf up and begin listening.)

Voicemod acquired one other audio results startup final 12 months, known as Voctra Labs, whose expertise Bosch says it’s working to mix with its personal to create an amped up hybrid platform. The combo has already allowed it to broaden what it affords — launching a text-to-song characteristic final December which helps you to flip your personal lyrics right into a vocal composition utilizing generative AI. He tells us extra is on the approach — together with the aforementioned sing-to-sing characteristic.

Voctra’s tech could also be acquainted because it was concerned in the growth of a voice clone of musician Holly Herndon which appeared in a viral Ted Talk final 12 months — during which her AI voice may very well be heard duetting with one other musician (Pher)’s actual voice in real-time. Which, nicely, in case you haven’t already seen it’s fairly the visual-audio spectacle, in addition to being a mouthful to clarify. It’s additionally a taster of what Voicemod has coming to a keyboard close to you.

“We’re positively going to launch extra merchandise and extra methods for folks to specific themselves with the generative AI expertise,” Bosch tells us. “Not all Voctra Labs’ applied sciences are associated to music — however they’ve a whole lot of expertise associated to singing, from this text-to-song expertise to sing-to-sing expertise in actual time. So we’ve got a whole lot of new initiatives and new merchandise of upcoming.

“We’re going to strengthen our speech-to-speech AI real-time expertise, as a result of we’re mainly merging our expertise with their expertise. We’re mainly creating an hybrid expertise that shall be higher than ours — or there’s a mixture of each… [So their sing-to-sing technology will be] mixed with our DSP expertise — that we might use to do autotune. So we might probably assist artists with their voice and on the tone. And so that is, that is gonna be actually, actually attention-grabbing.”

In addition to offering direct-to-consumer/creator audio tools, it affords its applied sciences through SDK and APIs for third events to combine into their very own merchandise, from video games and apps to {hardware}. So it’s set up to distribute its tech throughout the gamer-creator ecosystem and have demand come discover it.

Generative AI-powered disruption in audio in fact mirrors (in a non-exact fairground ‘loopy mirror’ sort of a approach) developments we’re seeing occur elsewhere: Visually, to graphics and illustration, on account of deep studying and the introduction of prompt-based picture era interfaces (reminiscent of DALL-E and Steady Diffusion). Additionally to the written phrase, by the massive language fashions that underpin generative AI chatbots like ChatGPT that may produce music lyrics or a complete essay on demand. And, certainly, in the case of musical composition — the place Google lately confirmed off a prompt-based generative AI music composer which may apparently produce preparations that match the musical vibe you describe (though it stated it’s not releasing that exact generative AI mannequin — however absolutely another person will).

It’s clear that AI is bending the guidelines of what it’s attainable for a single particular person to create. And, nicely, as with freedom, the open idea, that is each thrilling and terrifying. As a result of, it’s what you do with it that counts.

The approaching years are going to be all about discovering out what folks do with such highly effective AI tools at their fingertips.

Picture credit: Voicemod

Voicemod is positioning itself to ride this wave by constructing a toolbox for creators to survive and thrive in a reality-bending future and throughout a spread of use-cases — therefore it’s speaking by way of sonic identification and voice avatars for the social metaverse (at the future-gaze-y finish) but in addition simply serving to you sound your glowing finest on a piece Zoom name. So a type of audio make-up because it had been. Apply as wanted.

“Now instantly everybody can develop into a creator,” predicts Bosch of the generative AI boon. “Everybody can come, mainly, with no ability set. Or with no learnings on how to actually craft these audios. They are going to be ready to really create these items of music. Songs. And this finally evolves into into — most likely — even voices. So the potential to create voices.”


“This might probably be one thing actually viral for platforms like TikTok, or YouTube Shorts or Instagram… And this might finally evolve into issues like karaoke, for instance. And be, I don’t know, a part of sport consoles, or issues like that, for folks to use this to entertain. And, if we go a step additional — and it’s the expertise getting higher and higher as we predict it is going to be — this might probably be an expert device for individuals who need to create music. Or for individuals who need to create voices for motion pictures or voices for video games characters.

“Now we have a robust perception in user-generated content material, and we’re constructing tools for our customers to begin creating sounds and creating voices. And we shall be placing expertise in the fingers of the customers to create these [sounds]. And, finally in the future, hopefully, they’ll go even to an expert stage.”

So whereas — at present — to ensure that the startup to synthesize a complete voice it does nonetheless contain a crew of sound engineers and designers, Bosch suggests generative AI will put that energy in the fingers of the particular person — and it’ll occur quickly; “in the close to future”.

“I don’t know if we’ll be prompting — now we’re on this wave of all the pieces is completed by prompts — I’m undecided if that shall be the approach or it is going to be extra tools that can have AI expertise embedded and we’ve got consumer experiences that can make issues so much simpler,” he provides. “However positively what I see from generative AI in the viewers but in addition in the administration part is that instantly everybody’s can come develop into a creator, which I believe is de facto attention-grabbing.”

The start of AI voice could not sound like superb information for the employment prospects of sound engineers and designers (albeit, tech advances could merely create new necessities that simply shift the place their experience is required). However Bosch reckons that voice actors, not less than, will nonetheless have a key position to play — emoting for AI. Since robotic voices aren’t good at getting the pitch and intonation, or certainly emotion, proper. It’s a voice clone with no soul, mainly. (Or as Nick Cave might put it, AI voice lacks ‘its personal blood, its personal wrestle, its personal struggling’ — it lacks humanness.)

“I believe that you’ll all the time want a human think about your pattern with these voices,” suggests Bosch. “You can have the finest voice — of even a well-known particular person — however what actually comes is the impression. You continue to want a human to do the cadence on the phrases. You continue to want a human to do the rhythm, the tone. So [it’s not just that] I can converse usually and I’ll sound like a well-known particular person — no, you don’t — you continue to want to act somewhat bit. So… I believe human issue for expression is vital.”

Would possibly generative AI not find a way to be study to emote as nicely, with the proper human data-sets — and additional dial up its mimickry in order to make us snort or cry or love or hate on-demand too?

“Yeah. Properly, we are going to see,” responds Bosch. “I’m undecided. I imply, as of right now, for me AI is a device to be utilized by people. However yeah, we don’t know the place that is going to evolve.”

Voicemod for Desktop

Voicemod for Desktop (Picture credit: Voicemod)

Voicemod is gearing up for no matter phonic crazyiness lies forward with a contemporary tranche of funding. The 2014-founded startup has been income producing for years, through professional variations of its tools — its most important product, Voicemod for Desktop, has had greater than 40 million downloads to-date, whereas Bosch says it has 3.3 million month-to-month energetic customers — however it’s simply closed $14.5 million in enlargement funding, following an $8M Series A back in summer 2020. Madrid-based Kfund’s development fund Leadwind, led the spherical, with participation from Minifund (Eros Resmini former CMO at Discord) and Bitkraft Ventures.

“We’re tremendous excited by what generative AI can do to all inventive industries and extra particularly audio, particularly when it comes to enhancing and augmenting the job that inventive folks already do,” Jamie Novoa, companion at Kfund, tells ClassyBuzz. “In the previous few months there’s been an explosion in generative AI typically and extra particularly in audio however we predict it is a phenomenon that’s simply beginning.

“What a lot of the cool applied sciences being launched to market lack are concrete and scalable enterprise fashions connected to them, and Voicemod differentiates itself from the pack by having constructed a product utilized by thousands and thousands of individuals every day and with important income traction. We’re tremendous enthusiastic about what Jaime and the remainder of the Voicemod crew have in the pipeline and what’s to come.”

Voicemod says the additional funds shall be used to improve the growth of its real-time AI voice identification capabilities — and dial up its proposition for Gen Z, avid gamers, content material creators, and professionals of all ability ranges wanting tools to assist them specific themselves vocally in digital areas.

Per Bosch, a part of the cause it’s taking extra funding now relates to the acquisition of Voctra Labs. Past that, he says it’s about making the most of the alternatives sparking off the Cambrian explosion in generative AI tools.

“We’re in the center of great revolution in AI,” he says. “We wish to be nicely funding so as to find a way to develop expertise but in addition to find a way to ship expertise to customers. So I believe certainly one of our aggressive benefits is that we have already got the market and the traction and we mainly are ready to put this in the fingers of the customers. And I need to be sure to have sufficient runway, additionally due to market situations, to find a way to put all of this in place. So it is going to be primarily targeted… on constructing the subsequent era AI expertise and placing it in the fingers of the customers and likewise constructing these creation tools for the customers to create content material.”

The primary new device shall be touchdown subsequent month — with a launch of Voicemod’s desktop product on macOS (at present it’s PC solely). The objective is to evolve right into a multi-platform product spanning all units. “We’re additionally engaged on a creation device cell app that hopefully will see the gentle in the direction of the starting of subsequent quarter. And, and yeah, some extra stuff to come, hopefully,” Bosch provides.

He additionally tells us the startup is engaged on a watermarking expertise which it hopes to launch in Q2 this 12 months — to give platforms a approach to find a way to spot AI-generated voices in the wild.

Such a characteristic is probably going to be a significant device to counter all the attainable unfavorable use-cases (scams, fraud, manipulation, abuse, bullying, trolling and so forth and so forth) one might think about people coming up with for voice-shifting tools that allow you to sound precisely like somebody you’re not.

“It’s an algorithm to watermark the audio,” explains Bosch. “Moderation is is difficult as a result of it actually modifications relying on the area… on that are the platforms the place the audio is used — so we consider that the channel is the one that ought to personal that moderation and what we’re doing is we shall be offering this watermarking system to ensure that them to find a way to know if the audio is created through artificial voice or is created by an actual voice.”

“Each single new expertise can be utilized for for the good or for the dangerous,” he provides. “So we’re in fact placing some expertise some tools in place to find a way to have extra management round a misuse of this expertise.”

On questions of licensing for coaching information, IP points listed here are at present a gray space — as the regulation hasn’t caught up with developments in AI (not to mention generative AI). Meaning startups working in the area have to contemplate whether or not to make the most of whole authorized freedom to do no matter they need (and hope costly penalties don’t come clanging down on them in brief order), or tread extra rigorously and thoughtfully. (Different startups in the area embody the likes of Voice AI, Koe and ElevenLabs.)

Bosch claims Voicemod is taking the latter strategy — utilizing (paid) voice actors to construct up data-sets to prepare and hone its AI fashions. If it desires to make use of some unique content material he says the crew will go to the IP supplier and negotiate — and determine what sort of licensing phrases they’d be up for. (The generative AI increase can also be a crazy-thrilling time to be an IP lawyer, clearly.)

“We’re mainly pioneering right here,” he provides. “So a whole lot of issues are with out legal guidelines but so we had been making an attempt to stick to our values, mainly, and check out to do the proper factor. That’s our strategy on the information [side]. However yeah, you’re fully, proper — there’s no ‘authorized attachment’ to your voice, as of right now… We personal our fingerprint. You don’t personal, like, no matter the fingerprint of your voice [is]. As of right now.

“It sounds somewhat bit like science fiction however perhaps, in the future, we are going to ‘personal’ one thing associated to our voice.”

For the file, Bosch was speaking to me with his precise voice. The corporate’s real-time voice-shifting expertise doesn’t but work over cell. However he says that’s coming too. So buckle up: The synthesized future is gonna be a screaming wild ride.

Show More

Related Articles

Back to top button