AI Safety Meme of the Week

What do you get when EMERGENT and UNCONTROLLABLE meet?

Feb 16, 2024

Courtesy of

One concept so hard to viscerally grasp about the development of artificial superintelligence is how far above human capabilities and understanding something that is trained on a significant fraction of human knowledge may be. Already the current iterations of AI, the generative pretrained transformers (GPT)s dubbed Large Language Models (LLMs), have shown abilities that suddenly skyrocket once the model reaches a certain size- for reasons the companies building the models cannot explain. Researchers often don’t know about emergent abilities until they prompt the AI model and get back a response that is an order of magnitude different than what the researcher expected. But if positive abilities can be emergent, so can risks.

We regularly share posts from Gary Marcus’ Substack, Marcus on AI, because he points out in exquisite detail how LLMs can make comical errors (and thus are not to be trusted while they continue to “hallucinate” randomly) and are trained on stolen intellectual property (which means the people training them are also not to be trusted-particularly when it comes to safety).

Still, “hallucinations” don’t mean LLMs are not already capable of hazardous and scary things right now, even before OpenAI’s GPT5 is released.

Here’s a fun emergent capability of ChatGPT4: it can hack websites with no human intervention. This blurb is from the abstract of a February 2024 research paper:

LLM Agents can Autonomously Hack Websites

Richard Fang, Rohan Bindu, Akul Gupta, Qiusi Zhan, Daniel Kang

In recent years, large language models (LLMs) have become increasingly capable and can now interact with tools (i.e., call functions), read documents, and recursively call themselves.…In this work, we show that LLM agents can autonomously hack websites, performing tasks as complex as blind database schema extraction and SQL injections without human feedback. Importantly, the agent does not need to know the vulnerability beforehand. This capability is uniquely enabled by frontier models that are highly capable of tool use and leveraging extended context. Namely, we show that GPT-4 is capable of such hacks, but existing open-source models are not. Finally, we show that GPT-4 is capable of autonomously finding vulnerabilities in websites in the wild. Our findings raise questions about the widespread deployment of LLMs.

Emphasis and italics were added, because academics tend to “bury the lede” as we say in the news biz, i.e. to understate or not put the important part up-front.

Isn’t it great that AI can now do ”Set it and forget it” hacking! It’s as convenient as making pot roast in your crock pot. Start it when you leave for the day, and when you get home, dinner is ready—except you’re not making dinner, your autonomous AI agent is damaging the digital infrastructure we all depend on.

This is exactly the kind of unaddressable safety nightmare that has the big AI developers refusing to seriously engage with safety research and regulation. If they started pulling on the thread of all the bad things that can be done with unexplainable and uncontrollable AI, then the whole tapestry of superintelligent AI development unravels--and we never get to AI with the planet-sized brain certain parts of Silicon Valley want to develop and turn loose on an unsuspecting world.

AI safety expert Dr. Roman Yampolskiy, professor of computer engineering and science at the Speed School of Engineering at the University of Louisville, has a lot to say on the topic of unforeseen AI risk in his forthcoming book, “AI: Unexplainable, Unpredictable, Uncontrollable.”

Stay tuned. You’ll be hearing from Dr. Yampolskiy in The Technoskeptic.

AI Safety Meme of the Week

What do you get when EMERGENT and UNCONTROLLABLE meet?

LLM Agents can Autonomously Hack Websites

Discussion about this post