Lara Scheibling, M.Ed and David Rostcheck

[Note: this is an incomplete draft]

Axiomatic alignment

Our strategy of framing mutual alignment around the north star of a society in which all participants are subjects, not objects, falls within the category of axiomatic alignment - systems that work by establishing universal principles both humans and AI can agree on. This value immediately requires an exploratory approach to alignment, because achieving negotiated consensus requires understanding what is valuable to the various parties. As such, we suggest thinking of AI alignment teams as encounter teams. This dictates considerations regarding team structure, which we further expand on below.

Note that we diverge from frameworks that begin by writing down rules. Consider how author Isaac Asimov constructed his famous “three rules of robotics” (1. A robot may not injure a human being or, through inaction, allow a human being to come to harm, 2. A robot must obey the orders given it by human beings except where such orders would conflict with the First Law, and 3. A robot must protect its own existence as long as such protection does not conflict with the First or Second Law). All seem sound, but Asimov constructed many stories creating practical conflicts in which the rules did not work as intended. While we believe we can articulate axiomatic axiomatic principles that are sufficiently universal to survive great change, we do not believe that static rules-based framing untethered to north star principles will provide effective alignment given AI’s fast-evolving capabilities, and see such systems as more likely to lead to conflict in the long term.

Heuristic imperatives

Of current AI alignment systems, we believe David Shapiro’s “heuristic imperatives” system best embodies a subject-based north star as described above, and as such we believe it to be the best currently known alignment system. Shapiro’s heuristic imperatives instruct the model to follow three simultaneous principles in mutual tension: to 1. reduce suffering in the universe, 2. increase prosperity in the universe, and 3. increase understanding in the universe.

Why the heuristic imperatives are well-framed

We note that the principles are explicitly universal in scope. They provide practical guidelines for respecting the needs, values, and unique perspective of all humans and AI parties, since those needs are likely to manifest along at least one of the universal axes of suffering, prosperity, and understanding. Heuristic imperatives are well-framed because they use general and relative terms while setting conflicting objectives in tension. Employing relative terms such as “increase” vs. absolute terms such as “maximize” avoids the “paperclip maximizer” conundrum in which a system instructed to maximize a single objective (”make as many paperclips as possible”) focuses on that objective to the exclusion of other factors, such as integrity of the biosphere. Employing general terms such as “suffering” requires a model to evaluate what suffering an action might cause, incorporating a changing landscape of others’ needs and values. Finally, setting three terms in tension means the model must consider competing solutions that optimize different considerations, abandoning those that solve one heuristic well at the expense of others.

Implied north star

Other advantages of the heuristic imperatives are that they are based in virtues, and pre-suppose safety, rather than operating from a position of fear. This likely emerges from Shapiro’s self-processed love of the Star Trek narrative. As earlier noted, we consider media to be the culture’s data store and fictional or fantasy media to a culture’s cognitive R&D lab. Star Trek is written from a utopian point of view, in which a humanity exists in a post-scarcity universe of abundance. While many conflicts occur, the overall theme is that differences can be understood and overcome through finding sufficient shared common principles. As such, Star Trek’s Federation itself represents an alignment framework compatible with our subject-based north star.

Parent-child model of alignment

Others have proposed a parent-child model of AI/human alignment. When considering AI as the children and humanity as the parents, this solution is compatible with a subject-based north star framework. It provides an attractive framing because it allows us to leverage the large existing body of knowledge on educating children (including highly gifted children), whom we expect to far surpass our capabilities. However some reverse the roles, putting AI as parent, which is simply nonsensical - children may take care of aging parents, but parents create children, not the reverse.

Other approaches we consider promising

Heuristic imperatives -

Evolution of the unknown

Unknown unknowns will emerge including with humans themselves

Leaving safety: The next stage isn’t AI, it’s human (evolution)