The reality of deepfakes convincingly mimicking politicians’ voices, potentially influencing elections, is not just a hypothetical scenario—it’s already happening. However, there are grounds for optimism regarding society’s capacity to detect fake media and uphold a collective comprehension of ongoing events. Yet, despite our confidence in the future’s security, concerns persist about the integrity of historical records.
The light within
— DanishTorta (@danishtorta) March 10, 2024
✨🔶✨#ai #aiart #aiartcommunity pic.twitter.com/3NlRXOHTlC
History indeed holds significant potential for manipulation and misconduct. The same generative AI capable of fabricating current events can also distort past occurrences. While newly generated content may be safeguarded through built-in systems, there remains a vast trove of unmarked content. Watermarking, which involves embedding imperceptible information into digital files for traceability, could mitigate this issue. However, until watermarking becomes ubiquitous and the public grows accustomed to distrusting non-watermarked content, the credibility of pre-existing material remains vulnerable to scrutiny.
Indeed, the proliferation of generated documents will offer ample opportunities to bolster false claims. From fabricating photos depicting historical figures in compromising scenarios to altering individual narratives in historical records like newspapers and property deeds, the potential for manipulation is extensive. While such tactics have been employed previously, the challenge of countering them intensifies as the cost of creating near-perfect fakes diminishes significantly.
This forecast draws from historical precedents, highlighting instances where economic and political powers manipulated the historical narrative to suit their agendas. Examples abound: Stalin infamously purged disloyal comrades from history by executing them and doctoring photographic evidence to erase their existence. Similarly, upon Slovenia’s independence in 1992, over 18,000 individuals, mainly from marginalized groups like the Roma minority and other ethnic non-Slovenes, were expunged from the registry of residents. The government’s actions led to the loss of their homes, pensions, and access to essential services, as documented in a 2003 report by the Council of Europe Commissioner for Human Rights.
Fabricated documents often play a significant role in attempts to alter historical facts. A prime example is the notorious Protocols of the Elders of Zion, initially published in a Russian newspaper in 1903, which claimed to be the minutes of a meeting about a global Jewish conspiracy. This document was debunked as a forgery in August 1921, revealed to be plagiarized from several unrelated sources. Despite this, the Protocols were widely used in Nazi propaganda and have been used to rationalize antisemitic violence over the years, even being referenced in Article 32 of the 1988 founding covenant of Hamas.
In 1924, a document known as the Zinoviev Letter was published by The Daily Mail. This letter was allegedly a confidential message from the leader of the Communist International in Moscow to the Communist Party of Great Britain, urging them to rally support for establishing normal relations with the Soviet Union. This publication occurred just four days before a general election, and the ensuing controversy may have led to Labour’s defeat. While the letter’s origin remains unverified, its authenticity was doubted even at the time of its publication. An official inquiry in the 1990s concluded that the letter was likely crafted by the White Russians, a conservative group led by Russian emigrants who were against the Communist regime.
Years later, Operation Infektion, a disinformation campaign by the Soviets, utilized counterfeit documents to propagate the notion that the U.S. had created H.I.V., the virus that leads to AIDS, as a biological weapon. In 2004, CBS News retracted a contentious story due to their inability to verify the documents, which were later debunked as forgeries, that cast doubt on the prior service of then-president George W. Bush in the Texas Air National Guard. As the creation of historical disinformation becomes simpler and the sheer quantity of digital fabrications skyrockets, the potential to alter history or at least challenge our current perception of it increases.
The idea of political entities using generative A.I. to effectively alter history, not to mention swindlers fabricating false legal documents and transaction records, is alarming. Thankfully, the same companies that introduced this risk have also paved the way forward.
By indexing a significant portion of the world’s digital media for model training, AI companies have essentially developed systems and databases that will soon encompass all digitally recorded human content, or at least a substantial representation of it. They could initiate the process of recording watermarked versions of these primary documents today. These documents encompass newspaper archives and a broad array of other sources, making any subsequent forgeries instantly recognizable. This approach could serve as a powerful tool in the fight against disinformation and forgery.
Indeed, there are significant challenges to such an endeavor. Google’s digital libraries project, which aimed to digitize millions of books from libraries worldwide and make them freely accessible online, encountered intellectual property restrictions that made the historical archive unfeasible for its intended purpose of making these texts searchable to anyone with an internet connection.
These same intellectual property concerns are causing unease among creators and companies regarding the training data supplied to generative AI and its potential implications when used to generate content.
Given this fraught history, including Google’s unsuccessful investment in its digital libraries project, the question arises: who will undertake and finance a similar colossal effort to create unalterable versions of historical data? Both the government and industry have compelling reasons to do so. Many of the intellectual property concerns associated with providing a searchable online archive do not apply to the creation of watermarked and timestamped versions of documents. These versions do not need to be publicly accessible to fulfill their purpose. A claimed document can be compared to the recorded archive using a mathematical transformation of the document known as a hash. This is the same technique that the Global Internet Forum to Counter Terrorism uses to assist companies in screening for known terrorist content. This approach could serve as a powerful tool in the fight against disinformation and forgery.
In addition to creating a valuable public resource and safeguarding citizens from the risks associated with the manipulation of historical narratives, the establishment of verified records of historical documents can be beneficial for major AI companies. Recent research indicates that the performance of AI models trained on AI-generated data deteriorates rapidly. Therefore, distinguishing between what is genuinely part of the historical record and newly fabricated “facts” may be crucial.
Preserving the past also entails preserving the training data, the tools used to process it, and even the environment in which these tools were executed. Vint Cerf, an early internet pioneer, referred to this kind of record as “digital vellum,” and it’s needed to secure our information environment.
This digital vellum could be a potent tool. It can assist companies in building superior models by enabling them to analyze which data to include for optimal content and aid regulators in auditing bias and harmful content in the models. Tech giants are already undertaking similar efforts to document the new content their models are generating — partly because they need to train their models on human-generated text, and the data produced following the adoption of large language models may be contaminated with generated content. This approach could serve as a powerful tool in the fight against disinformation and forgery.
Indeed, the urgency to extend these efforts to historical data is clear. As we move forward in the digital age, it’s crucial to ensure that our understanding of history remains grounded in truth and not distorted by artificially generated narratives. This will help prevent the manipulation of political discourse and safeguard the integrity of our shared historical record. It’s a challenging task, but with the right tools and commitment, it’s certainly achievable.