What are you researching as part of your doctorate?
My research focuses on various aspects of the use of artificial neural networks in the generation of new audiovisual content. The main objective of the research is an attempt to solve the problem of so-called 'deepfakes', i.e. audiovisual materials generated with the use of artificial intelligence, in which the identity of the person being attacked is fabricated (e.g. the face of the person appearing in the recording is replaced with the face of someone completely different, who did not consent to the publication of the image, or the speech sample is synthesised to look deceptively like the voice of the person in question). The rapid development of AI methods in recent years has already made the generated content indistinguishable to the human eye (or ear) from the actual recordings.
How is such fake material distinguished from real material?
It is rather ironic that the development of AI methods, which has led to the current situation, seems to be the best remedy for it. Despite the fact that a human is in most cases unable to determine whether a material is real, a properly trained AI-based model should be able to give us such an answer. Of course, specific solutions depend on specific attack methods, and these are constantly evolving. They are becoming better and better, more and more advanced, which makes it difficult to know at what stage of progress we are at in terms of preventing attacks. Artificial intelligence research is very dynamic: just two to three years ago, work in the field of image generation focused on specific applications, such as face generation (incidentally, AI was already able to generate photorealistic images of faces that did not exist in reality); today, models are in widespread use that allow the generation of an image representing any scene, based solely on a short text description. In a short while, it will probably be possible to generate realistic video clips using the same method. It is this dynamic that makes research in this area so fascinating, while also posing a host of security challenges.
For what purposes are 'deepfakes' most commonly used?
When it comes to ethically questionable uses, political/propaganda material, for example generated recordings of speeches by heads of state or important politicians, as well as pornographic content, lead the way. However, I would like to raise awareness of the fact that not only video content, but
also audio content can pose a serious problem. Already, telemarketing calls already bear the hallmarks of harassment in many cases, and imagine a situation where, instead of a robotic voice on the other end of the phone, we hear fully natural speech generated by AI. What's more, the AI does not speak automated, pre-determined and pre-recorded messages, but speaks to us freely thanks to a language model similar to the recently very popular ChatGPT, and does not allow itself to be sidetracked by questions that are not relevant to the advertised offer. This is not an optimistic outlook, but I would nevertheless not want to spread a defeatist vision or point to content-generating models only in terms of threats.
And are there any non-negative examples?
Such models find a wide range of applications primarily in the entertainment industry - they allow, for example, the creation of realistic character models with the image of a selected person in computer games or "virtual extras" in films. Thanks to methods for manipulating the content of images, it is possible to edit video materials without requiring specialist skills or large amounts of money. Similarly, the generation of completely new, non-existent content without the need to record it, using only a description of the scene in question and, moreover, in natural language, which does not even require programming or software skills for this purpose. This is also the case with audio generation - the ability to transfer the voice of a specific actor to another recording or synthesise speech with parameters that are confusingly similar to his or her voice saves time and resources. An example of this can be the situation of the so-called "docudramas". - when there would normally be a need to bring an actor into the studio to re-record some dialogue, AI can successfully solve this problem. The applications of this technology appear to be extremely broad, although in principle they could be reduced to simplifying the creative process and removing unnecessary technical barriers.
How does the work of an AI researcher look like?
Probably less exciting than one might expect. For every second of AI-generated video clip, there are hundreds of hours of literature analysis, finding (or in some cases creating/recording), cataloguing and pre-processing the massive amounts of data used to train the models, optimise and supervise the training process itself and analyse the results. Skills in programming, data analysis and teamwork are essential. In such a fast-moving discipline, it is also extremely important to be able to continuously update and expand your knowledge, as new solutions and methods appear month by month and even day by day.
Interview by Agnieszka Garcarek-Sikorska