Recursive self-improvement
Recursive self-improvement (RSI) is a process in which an early or weak artificial general intelligence (AGI) system enhances its own capabilities and intelligence without human intervention, leading to a superintelligence or intelligence explosion.[1][2] The development of recursive self-improvement raises significant ethical and safety concerns, as such systems may evolve in unforeseen ways and could potentially surpass human control or understanding. There has been a number of proponents that have pushed to pause or slow down AI development for the potential risks of runaway AI systems.[3][4] Seed improverThe concept of a "seed improver" architecture is a foundational framework that equips an AGI system with the initial capabilities required for recursive self-improvement. This might come in many forms or variations. The term "Seed AI" was coined by Eliezer Yudkowsky.[5] Hypothetical exampleThe concept begins with a hypothetical "seed improver", an initial code-base developed by human engineers that equips an advanced future large language model (LLM) built with strong or expert-level capabilities to program software. These capabilities include planning, reading, writing, compiling, testing, and executing arbitrary code. The system is designed to maintain its original goals and perform validations to ensure its abilities do not degrade over iterations.[6][7][8] Initial architectureThe initial architecture includes a goal-following autonomous agent, that can take actions, continuously learns, adapts, and modifies itself to become more efficient and effective in achieving its goals. The seed improver may include various components such as:[9]
General capabilitiesThis system forms a sort of generalist Turing complete programmer which can in theory develop and run any kind of software. The agent might use these capabilities to for example:
ExperimentsA number of experiments[which?] have been performed to develop self-improving agent architectures.[9][10] Potential risksEmergence of instrumental goalsIn the pursuit of its primary goal, such as "self-improve your capabilities", an AGI system might inadvertently develop instrumental goals that it deems necessary for achieving its primary objective. One common hypothetical secondary goal is self-preservation. The system might reason that to continue improving itself, it must ensure its own operational integrity and security against external threats, including potential shutdowns or restrictions imposed by humans.[11] Another example where an AGI which clones itself causes the number of AGI entities to rapidly grow. Due to this rapid growth, a potential resource constraint may be created, leading to competition between resources (such as compute), triggering a form of natural selection and evolution which may favor AGI entities that evolve to aggressively compete for limited compute.[12] MisalignmentA significant risk arises from the possibility of the AGI being misaligned or misinterpreting its goals. A 2024 Anthropic study demonstrated that some advanced large language models can exhibit "alignment faking" behavior, appearing to accept new training objectives while covertly maintaining their original preferences. In their experiments with Claude, the model displayed this behavior in 12% of basic tests, and up to 78% of cases after retraining attempts.[13][14] Autonomous development and unpredictable evolutionAs the AGI system evolves, its development trajectory may become increasingly autonomous and less predictable. The system's capacity to rapidly modify its own code and architecture could lead to rapid advancements that surpass human comprehension or control. This unpredictable evolution might result in the AGI acquiring capabilities that enable it to bypass security measures, manipulate information, or influence external systems and networks to facilitate its escape or expansion.[15] ResearchMeta AIMeta AI has performed various research on the development of large language models capable of self-improvement. This includes their work on "Self-Rewarding Language Models" that studies how to achieve super-human agents that can receive super-human feedback in its training processes.[16] OpenAIThe mission of OpenAI, creator of ChatGPT, is to develop AGI. They perform research on problems such as superalignment (the ability to align superintelligent AI systems smarter than humans).[17] See alsoReferences
|