Even for its “safe” models, the safety measures can be easily fine-tuned away or bypassed. No other major American AI lab is reckless enough to distribute model weights like Meta – instead, they provide structured access (i.e., cloud-based APIs) to help prevent misuse.
Safety measures for large language models (LLMs) like Llama 2 aren’t just about whether they spew hateful content or not. Next-generation LLMs are also increasingly being developed as autonomous agents without the need for a human in the loop.
Researchers have found that LLMs can be used to scale spear phishing campaigns, suggest cyberattacks, synthesize dangerous chemicals, or help plan biological attacks. Over the coming years, they will only become more capable at malicious activities. That’s why AI safety must be non-negotiable. But Meta is apparently planning to release model weights for even more advanced AIs, such that safety measures can still be removed. Leadership at Meta has said they aren’t worried if bad actors gain access to malicious AIs, even if the AIs have human-level or superhuman capabilities.
Meta released LLaMA (Large Language Model Meta AI) on February 24, 2023 on a “case-by-case” basis to the researchers in academia, government, or industry, under noncommercial access terms. However, there is some indication that Meta granted access broadly for students using an .edu email address, not just researchers. LLaMA is a family of pretrained models with no training for safety, making it easy to elicit objectionable responses
Just one week later, on March 3, LLaMA leaked as a downloadable torrent to 4chan. In response, Meta commented, “we believe the current release strategy allows us to balance responsibility and openness.” Cybersecurity researcher Jeffrey Ladish tweeted, “Well Meta's 65 billion parameter language model just got leaked to the public internet, that was fast. Get ready for loads of personalized spam and phishing attempts.”
In June 2023, U.S. senators Blumenthal and Hawley wrote a bipartisan letter to Mark Zuckerberg to “demand answers & warn of misuse after ‘leak’ of Meta's AI model.” They wrote, “By purporting to release LLaMA for the purpose of researching the abuse of AI, Meta effectively appears to have put a powerful tool in the hands of bad actors to actually engage in such abuse without much discernable forethought, preparation, or safeguards.” Meta did not respond publicly.
One month later, in July 2023, Meta released model weights of Llama 2 publicly, This was followed by Code Llama in August 2023, trained specifically on code. Both Llama 2 and Code Llama are available in two varieties: pretrained (with no safety fine-tuning) and fine-tuned (for instruction-following and safety), in a range of model sizes.
The irreversible proliferation of LLaMA and release of Llama 2 is in violation of the NIST AI Risk Management Framework, which defines as a safety standard, “Processes and procedures are in place for decommissioning and phasing out of AI systems safely.”
In September 2023, Senator Schumer held a closed-door meeting with tech executives and most senators to discuss AI governance. Tristan Harris from Center for Humane Technology “told the room that with $800 and a few hours of work, his team was able to strip Meta’s safety controls off LLaMA 2 and that the AI responded to prompts with instructions to develop a biological weapon,” the Washington Post reported.
Meta plans to build an AI as powerful as GPT-4 and release it as “open source,” despite the safety risks. Yann LeCun, Chief AI Scientist of Meta AI, anticipates that future AI will eventually have superhuman capabilities in all domains. Even for superintelligent AI, he believes in releasing base models openly, apparently with minimal regulation. This includes releasing powerful AI to bad actors to develop “evil AI,” but he trusts that the good AIs will be sufficient to protect society. This apparently disregards offense–defense asymmetries; a single lab could conceivably unleash a bioengineered pandemic, but it cost tens of billions to develop safe and effective vaccines like the ones for COVID-19.
Business Insider describes the exchange with the Center for Humane Technology in more depth:
“During the session, Zuckerberg attempted to downplay Harris' statement that Llama 2 can tell users how to make anthrax, saying anyone who was looking for such a guide could find out how to make anthrax on YouTube, according to both of the senators present. Harris rejected the argument, saying such guides do not come up on YouTube, and even if they did, the level of detail and guidance provided by Llama 2 was unique to such a powerful generative AI model. It's also largely an open-source model, meaning it's freely available to use and adapt.
“‘It was one of the only moments in the whole thing that was like, ‘Oh,’’ one of the senators present said, describing the exchange as having caught people's attention. ‘Twenty-four out of the 26 panelists there basically said exactly the same thing over and over: ‘We need to protect AI innovation but with safeguards in place.’’
“Llama 2's power is well-known inside Meta. Its ability to turn up detailed instructions for creating a biological weapon like anthrax is to be expected, two people familiar with the company said.”
Researchers have found that language models, when deployed without sufficient safety measures, can provide detailed step-by-step instructions to help plan a pandemic. Other research, which worried the Senate, found that present-day AI can help fill in steps for bioweapons production that cannot be found on Google or in textbooks, and that in a few years, it is on-track to enable “many more actors to carry out large-scale biological attacks.”
Meta’s lack of concern for social responsibility extends well beyond its release of Llama models. Meta is the #1 tech company on the AI Incident Database, an index of current harms from AI. Meta’s Galactica model, released in November 2022, was widely critiqued for writing “racist and inaccurate scientific literature.” In September 2021, despite internal research showing that its social media platform had negative impacts on mental health, Meta downplayed these concerns. In December 2021, Meta was sued for £150 billion for how it neglected to monitor content inciting violence, contributing to the genocide of the Rohingya minority.
Read more expert concerns: