A journalist from The New York Times recently spent six months with some of our engineers here in Toronto documenting the team's journey creating the world‘s most realistic video and audio deepfake. The documentary captures the spirit of Dessa in a lot of ways—the push for scientific innovation, our fun-loving engineering team, our inability to settle for anything less than perfect.
Though the documentary has been edited down to 30 minutes, this adventure has been over a year in the making. The short film accurately captures the team's engineering capabilities, but in its brevity, misses out on key details around our team's work on deepfake detection and company-wide internal debates around the ethics of releasing this technology and its implications for AI policy. Here's the story you saw:
June 2018: Joseph Palermo (engineering), Hashiam Kadhim (engineering), and Ragavan Thurairatnam (co-founder) sit down at a team dinner and hatch a plan: to create a conversational bot which sounds like someone famous. This is the idea that will eventually evolve into a plan to create the world's best deepfake.
Autumn 2018: Joe and Hash come up with a hack to use YouTube subtitles and Mechanical Turk workers to create the training dataset which will be the basis of the audio deepfake experiments.
December 2018: Rayhane Mama (engineering) is recruited to join Dessa after being scouted on GitHub for his popular implementation of DeepMind's Tacotron-2. He moves across the world from a city in northern Africa to Toronto, Canada.
February 2019: Joe spends his weekends going through Mechanical Turk transcription results instead of going outside. It's cold out there anyway.
May 2019: RealTalk is released on social media. Everybody has an opinion, from highly critical to overly optimistic. It's covered in Vice, The Verge, Slate, and Gizmodo. Alyssa Kuhnert (communications) teams up with Pippin Lee (engineering) to write the announcement. Pippin builds fakejoerogan.com, an experiment to test how well the public can detect fake audio.
June 2019: David Barstow, four-time Pulitzer Prize-winning journalist, reaches out to discuss making a documentary about deepfakes. The CIA reaches out about licensing the technology without limits. We turn them down.
Summer 2019: Ray and Ragavan spend many late nights iterating on the perfect deepfake. Austin Mackillop (engineering) shaves his head to become Joe Rogan's body double for the documentary. Vince Wong (co-founder) shaves his head in solidarity.
October 2019: Sam Shi and Sachin Rana (engineering) join the deepfakes team. They, along with Ray, spend four weeks developing a deepfake detection with the Google FaceForensics dataset and find out it doesn't work on deepfakes “in the wild” due to differences in distribution. They augment the training dataset for the detector with videos from the internet.
November 2019: The documentary is released on The Weekly. Stephen Piron (co-founder) has access to American FX, and we all watch it for the first time alongside the rest of the world through a Google Meet screenshare on Sunday night. The reactions pour in. We're proud of what this team has accomplished.
There is, however, an entire side of the story which was not captured. Here's what didn't make it into the documentary—for reasons of time constraint, but also for the simple fact that the solutions are not nearly as attention-capturing.
May 2019: The company debates the choice to amplify Joe Rogan and his questionable politics. Dessa forms an ad-hoc crack team made up of Pippin, Alyssa, Helen Ngo (engineering), and Matthew Killi (CCO) to start the conversation around the policy implications.
We make the decision to not release the RealTalk code, inspired by the thoughtful discussion happening around the staged release of OpenAI's GPT-2 models. We take criticism for this decision, but know it to be the right one.
June 2019: The RealTalk work inspires much internal discussion on implications, juxtaposed against several inbound requests from companies looking to profit from the technology. We turn them all down, not ready to release this technology into the world without proper exploration of the implications.
Pippin, Sachin, and Michael Jia (engineering) create the first iteration of the audio deepfakes detector, which is featured in Axios.
Joe, Pippin, and Helen spend a weekend writing up a contribution to Jack Clark's open call for contributions to OpenAI's testimony on deepfakes in Congress for the House Intelligence Committee.
July 2019: Company-wide roundtable on ethics and our responsibilities as technologists. Everyone from the Engineering to Finance teams weighs in on our collective responsibility. We know that we don't have all (or really, any) of the answers, but it's a start. We come away from the experience humbled, inspired, and knowing that we have lots of work to do.
We meet with the Labour department of a European government to discuss the implications of this technology and provide thoughts on how we go forward with building machine learning applications which affect millions of lives.
September 2019: Pippin and Helen take on a side project working on misinformation and fake news detection.
October 2019: Joe gives a TEDxToronto talk on deepfakes, democracy, and our responsibilities as technologists and makers, after months spent consulting with experts in the field of misinformation.
November 2019: Our team continues to explore the ethical implications of deepfake technology and misinformation. We release a deepfake detector. Pippin shares the results of the Faux Rogan fake audio test, finding that human detection accuracy plateaus around 72.6%, but with one hopeful note: it seems that detection accuracy can be learned. Read his post here.
December 2019: Dessa has an upcoming meeting with policymakers in Washington, where we will discuss the implications of deepfakes and detection on upcoming world events. Interested in contributing? We have an open call for feedback and we'd love to hear from you on these outstanding questions:
What should policymakers know about the current capabilities of deepfake technology (and conversely, the state of deepfake detection)?
What are concrete ways to educate the general public about the truth value of a potential deepfake upon its release?
Is automated deepfake detection a worthwhile investment?
What are the responsibilities of policymakers, technology platforms, and individual ML practitioners in regards to deepfakes and their implications?