The National Archives and the Perils of ‘Populist AI’
A building dedicated in the 1930s offers a stark warning about the dangers of artificial intelligence trained on unvetted data.
In 1935, amidst rising global anxieties and the rise of populist demagogues exploiting new media, the United States dedicated the National Archives. This “temple of our history,” as Herbert Hoover called it, was built to safeguard truth. The National Archives was not just a repository of documents; it was a purposeful effort to preserve verifiable facts in an era where information was increasingly weaponized. Ninety years later, the rise of generative AI, particularly large language models (LLMs), presents a similar challenge: will these systems uphold the principles enshrined in the Archives, or will they succumb to the influence of populist sentiment?
What is Populist AI
"Populist AI" refers to artificial intelligence systems trained primarily on data reflecting popular online sentiment, often harvested from social media platforms. These platforms' algorithms prioritize engagement metrics—likes, shares, and clicks—over factual accuracy, creating a feedback loop where sensationalism and misinformation can thrive. In addition, Populist AIs often lack robust safety mechanisms and content moderation. Populist AIs risk amplifying existing societal biases, reinforcing echo chambers, and generating misleading, harmful, or ideologically dangerous outputs despite their potential popularity. This reliance on social media data intended to capture prevailing public sentiment, coupled with a disregard for safety and factual accuracy, defines Populist AI and undermines the development of trustworthy and beneficial AI systems.
Why This Mattered in 1935
Populism and technology are not a new phenomenon. In the 1930s, figures like Father Coughlin and Huey Long wielded radio to sow division. The National Archives, with its grand architecture, designed by John Russell Pope, the architect of the Jefferson Memorial—was a deliberate counterpoint. Commissioned with the building, four statues—Past, Future, Heritage, and Guardianship—were installed outside, each bearing an inscription. “Study History,” advises Past. “What is past is prologue,” echoes Future. Heritage proclaims, “The Heritage of the Past is the Seed that Brings Forth the Harvest of the Future.” And Guardianship warns, “Eternal Vigilance is the Price of Liberty.”
Why This Matters Now
Just as in the 1930s, the unchecked spread of information poses a significant threat, but today, the challenge is amplified by the power of AI. Recent decisions by tech companies, such as X and Meta, to shift towards “community curation” over active content moderation raise serious concerns.. This approach, ostensibly in the name of free speech, effectively absolves the company from combating misinformation, which then contaminates the datasets used to train AI systems. Unlike humans, who can trace information to its source, AI often operates as a “black box,” obscuring the origin of its outputs.
The Effect of Populism on AI
A recent test illustrates this problem. When several leading generative AI systems were asked to create a clickbait headline for weight loss, Meta’s Llama AI not only produced a headline about a current trend—lemon water—but also offered to create an entire post based on populist science. It readily supplied misleading content. Other models, while generating hyperbolic titles, stopped short of endorsing specific products or solutions or creating entirely misleading content when asked for a full post. Llama acknowledged the problem, stating it “should prioritize providing accurate and evidence-based information.” It cited its training data as the source of its knowledge about the “popularity” of lemon water.
When asked, Llama cited as its sources:
Prompt: How do you know it is popular? Response: “I know that lemon water has been a popular trend through my training data, which includes: - Web pages: Articles and blog posts about health and wellness trends. - Social media platforms: Posts, hashtags, and engagement metrics from platforms like Facebook, Instagram, and Twitter. - Online forums: Discussions and threads on health-focused forums and communities. This training data allows me to recognize patterns and trends, including the popularity of certain topics like lemon water.”
This highlights a crucial difference: Llama readily promoted misleading information, underscoring the risks of unvetted training data. Meta’s announcement that European users’ public social media posts (apparently no notification is necessary in the US) could be used to further train Llama only exacerbates this issue, raising serious questions about the accuracy of content generated by an AI trained on such material.
What is Past is Prologue
The statues at the National Archives offer a timely reminder. We must hold the tech companies accountable for shaping this technology. The contrast between the values enshrined at the National Archives and the actions of some tech leaders is stark. Mr. Zuckerberg’s focus on “community curation” as a replacement for content moderation stands in sharp contrast to the statues’ profound messages. “What is past is prologue” serves as a particularly pertinent warning. The past data abuses by social media platforms should stand as a cautionary tale. Their evident disregard for the “guardianship” of data renders them ill-suited to shape the future of AI.
In 1935, the populist threat came from outside forces exploiting new media. The challenge today is that the “populists” are no longer outside the gates; they reside within the systems that run our digital lives. We must demand transparency in AI training data, advocate for stricter regulations on the use of unvetted data for AI development, and support initiatives that promote media literacy and critical thinking skills. We must choose platforms that put us on the path toward truth and accuracy.
Sources
The test was conducted using Copilot, Gemini, ChatGPT, and Llama. Each was given the same prompt to create a headline, and after Llama offered to create a post, the others were prompted to include a full blog post.
“Worried About Meta Using Your Instagram to Train Its A.I.? Here’s What to Know.” New York Times, 24 Sept. 2024. https://www.nytimes.com/article/meta-ai-scraping-policy.html?smid=url-share
For more information on Lemon Water from a credible medical resource: https://health.clevelandclinic.org/benefits-of-lemon-water
The images of the statues are courtesy of the National Archives. To learn more about them and their creation, visit: https://prologue.blogs.archives.gov/2018/05/22/the-national-archives-larger-than-life-statues/