How do I do voice clip anonymization?

Hey guys. Here are the details. I’m helping a friend with his project and he needs to anonymize a few interviews. I asked at about a program that could do such a thing and they directed me here. Now, the thing is that this anonymization must be secure and irreversible (its a sociology experiment and the personal information of the participants must be kept strictly confidential). Is this possible? A great help woud be if there are any research papers out there on voice anonymization that could be used as citations for the process.

Probably the easiest and safest way would be to replace the original voice with someone else speaking the words and inform the audience about it.

Just try some pitch or frequency shifting

Here’s a few plugins that might help you out:

Depending on security needed, I strongly recommend you follow up on your thought of finding papers on the topic. Ardour with plugins can certainly do a type of masking and for the most part you can make it fairly irreversible, but I would want to be certain of the privacy implications before I started on such a project, and that is a topic I know nothing about so would want to research. If you do find some good reading on the topic, can you post it back here for others to read?


Hi and thaks for the replies. He can’t replace the original data, although they usually work from transcripts.

I’ve tried searching for papers on voice anonymization and I couldn’t find anything useful. Maybe i’m using the wrong keywords, I don’t know. I really can’t believe that noone does any research on this topic. With voice anonymization used by reporters and in films I kinda thought that there would already be a standardized process.

it really is a question how “secure” your anonymization has to be.

the anonymization in films you are talking about is mostly done with pitch shifting and distortion. depending on your case this is far from secure or already enough.
the pitch-shifting does only change the pitch, so the way a person speaks stays the same. for example pauses or stuttering, strange pronunciation and so on. so with a fair amount of work it can still be possible to find out who the speaker is.
you can make this harder by adding distortion to the voice.
for a really secure way you have to go the way c.l.a. proposed: rerecord the interview with another person or let it be read by a text to speech synthesizer. the only thing that can give any hints then are choice of words and structure of sentences.

i don’t get your first sentence:

“He can’t replace the original data, although they usually work from transcripts.”

you will have to “replace” the data anyway. be it by altering the original or by copying in a new record. or do you want it to be realtime?

The security needed here needs to be on a reasonable level. Meaning that it shouldn’t be reversible by spending 5 minutes in ardour fiddling with the sound file, but it doesn’t have to be analysis resistant to the police for example. From what I’ve been reading, even if you completely change the sound of the voice clip you are still left with all the other aspects that you mentioned that can’t be cleaned with algorithms, so total security with the original data seems very hard.

What I meant by replacing the data is that he can work on the data to make them anonymous but not replace the interviews entirely. Sometimes they use them again to pick up on interview flaws etc. to find problems with the experimental design so if you read the transcript using a different speaker, all these problems with the original interview vanish. But when the quality of the data are not in question, they usually use the transcripts to avoid wasting time.

I did find a paper on this, from the exact perspective we needed, interview anonymization.
This will do for our purpose although I was surprised that there aren’t any technical papers on it as well. Thanks for the help everyone.

By the way, one of the methods mentioned for anonymization in the paper is pitch shifting. How do you reverse that?

You can’t necessarily reverse most methods that would be discussed, but you can counteract them with reasonable effectiveness. Pitch shifting for example, you would just pitch shift back by an equivalent amount, you will hear some artifacts certainly but the voice would be fairly discernible and I wouldn’t consider it that good for anonymization purposes on it’s own. Combined with other things that aren’t quite as easy to counteract(Distortion was mentioned in example) and you have something more difficult and less natural sounding.


If it’s important to anonymise these recordings, I’d very strongly recommend not relying on what someone on the Internet tells you!

Use Audacity (free online).
Hi, if anyone is still interested, you can use Audacity to anonymize interview data. Here is a guide to censor songs which can be used for interviews as well:

i would try to use synthetic voice like espeak.