Gave another talk on AI alignment, this time at
#EAGxSingapore
last week -- appreciated the chance to condense my recent thinking about what it means to "align" AI in a world with a diversity of people & values by asking "What Should AI Owe To Us?" (1/11)
Will be presenting on "AI alignment, philosophical pluralism, and the relevance of non-Western philosophy" at the inaugural Effective Altruism Global x Asia-Pacific conference next weekend (Nov 20-22)!
@SingaporeEa
Register by Nov 18 at if interested :)
OpenAI is removing the ability to evaluate P(completion | prompt) for user-provided completions to the `gpt-3.5-turbo-instruct` model... (requires setting `echo=true` & `logprobs=0`)
Makes it impossible to use it as a likelihood function, or to query logprobs without sampling.
Apparently neither
@OpenAI
's new Chat API,nor
@AnthropicAI
's API for Claude allow users to request the log probabilities assigned to each token 🫤
This means they can only be used to generate text, not evaluate the probability of text under the model.
(h/t
@alexanderklew
)
LLMs *are* just predicting the next word at run time (ruling out beam search etc.)
It's just that predicting the next word isn't inconsistent with doing more complicated stuff under the hood (e.g. Bayesian inference over latent structure). Please read de Finetti's theorem y'all!
Bizarre to me that so many LLM benchmarks were using top-1 accuracy as a metric rather than the Brier score or similar -- apparently once you switch to the latter (and other continuous and/or linear metrics), many "emergent" behaviors go away!
Are Emergent Abilities of Large Language Models a Mirage?
Presents an alternative explanation for emergent abilities: one can choose a metric which leads to the inference of an emergent ability or another metric which does not.
Continued success of a recipe we've known since AlphaZero & DreamCoder: Use synthetic data generation and process-level supervision to train neural models to *guide* reasoning via approximate guesses, not replace reasoning entirely with a large pretrained model.
AlphaGeometry is a system made up of 2️⃣ parts:
🔵 A neural language model, which can predict useful geometry constructions to solve problems
🔵 A symbolic deduction engine, which uses logical rules to deduce conclusions
Both work together to find proofs for complex geometry
Sorry to be kinda annoying about this! But consider:
"Humans won't be able to supervise compilers smarter than us. For example, if a superhuman compiler generates a million lines of extremely complicated assembly, we won't be able to tell if it's safe to run or not."
Humans won't be able to supervise models smarter than us. For example, if a superhuman model generates a million lines of extremely complicated code, we won’t be able to tell if it’s safe to run or not, if it follows our instructions or not, and so on.
Looked more into this, and wow, the log. probs returned by OpenAI's API are *incredibly* unstable, especially for the latest model that supports it!
Across 10 runs on the *same* set of prompts:
text-davinci-002: std. dev. of 0.03
text-davinci-003: std. dev. of 0.21 (!!!)
😵💫
@gdb
Working on how LLMs can be used in Bayesian modeling and inference. It'd be great to have:
- normalized log probabilities for when temperature ≠ 1.0
- stable log probabilities (currently differs across API requests, keeping text fixed)
I respect Jacob a lot but I find it really difficult to engage with predictions of LLM capabilities that presume some version of the scaling hypothesis will continue to hold - it just seems highly implausible given everything we already know about the limits of transformers!
How can we build AI assistants that *reliably* follow our instructions, even when they're ambiguous?
@Lance_Ying42
& I introduce CLIPS: A Bayesian arch. combining inverse planning w LLMs that *pragmatically* infers human goals from actions & language, then provides assistance!
New Bayesian inference algorithm alert!
A little belated, but I was glad to play a supporting role on this paper by
@alexanderklew
& George Matheos: Sequential Monte Carlo w. Probabilistic Program Proposals (SMCP3)
Paper:
News:
Conclusive proof that large language models reproduce American cultural hegemony: I tried to get GPT-3 to speak Singlish, but my first attempt failed miserably T_T
I know Singapore likes using these stories as "racial harmony" propaganda, but "An Indian woman and Malay woman discover they're actually sisters born to Chinese parents" is honestly a great premise for a sitcom that teaches racial anti-essentialism.
I think part of why I'm willing to say stuff "LLMs can't do X" is bc reliability is part of my conception of capability!
If LLMs can't reliably perform X according to some behavioral metric in a wide neighborhood of situations, they don't have the general capability for X.
Reminder: capability and reliability are orthogonal aspects of LLMs. You can show the presence of a capability using examples/screenshots, but not absence. LLMs' remarkable capabilities make them exciting for research, but their unreliability limits their usefulness at present.
Some queer joy and defiance, in the wake of the Colorado Springs shooting and Trans Day of Remembrance:
I am a trans lesbian drag queen, and this is my girlfriend. Violence will not intimidate us.
What I've been doing this week instead of research: Fighting MIT's ridiculous, inhumane decision to stop funding overseas students unless they return by Jan 30 to the US. IN THE MIDDLE OF A PANDEMIC.
We sent an open letter (450+ signatures) in response:
if dreams are the brain's way of doing offline reinforcement learning on synthetically generated environments then god are those some stupid-ass training examples
Probabilistic representations of knowledge are good actually!! We should build AI systems that explicitly have them so they can act reliably under uncertainty! Probability is our friend!!
My grandmother, 外婆, passed away on Monday. Just 2 years ago, when I came out to her as trans, she embraced me with open arms.
Last we met she said she might not be around next time I was in Singapore. I didn't take it very seriously, but she was right. I'll miss her very much.
The fact that Bing Chat behaved as erratically and threateningly as it did -- despite OpenAI spending "6 months making GPT-4 safer and more aligned" -- is really quite dismal news.
Turns out if you use actual planning algorithms, then just use LLMs for what they're good at (as priors over declarative symbolic knowledge about the world), you do a lot better than forcing LLMs to try and "reason"!
We show that Ada *dramatically outperforms* other approaches for using LLMs in planning (including a Voyager-like model!) on two interactive planning benchmarks — Mini Minecraft and ALFRED. We’re excited to try scaling this to harder robotics domains! [4/5]
Ahhh my MEng student just submitted her thesis on Bayesian active learning of structured Gaussian processes so her friends made her this celebratory webcomic it's truly everything 😍😍😍
There's a reason why we don't worry the above! It's because compilers - which *are* superhuman - implement provably correct reasoning that we can efficiently check! When problems are formalizable, it's the best form of scalable oversight! Let's maybe build AI that way instead!
I too, a Scalable Instructable Multiworld Agent, require 1.2 million training steps across 7 video games to achieve a less than 50% success rate on Goat Simulator 3.
Introducing SIMA: the first generalist AI agent to follow natural-language instructions in a broad range of 3D virtual environments and video games. 🕹️
It can complete tasks similar to a human, and outperforms an agent trained in just one setting. 🧵
Sharing LLM research is treacherous terrain these days.
Skeptical? You're moving the goal posts.
Underclaim? You're underestimating the risks.
Overclaim? You're feeding into AI hype!
Use new or non-standard terms? You're reinventing the wheel.
Something I've been working on over the past semester: Genify, a program transformation tool that makes arbitrary Julia code controllable by a probabilistic programming system like Gen ()!
Nice thread for AI/ML people to read - I think we're often miscalibrated about how (in)credulous the average person is about systems like ChatGPT because we're more familiar with their workings, and correspondingly more skeptical / aware of their limitations.
So I followed
@GaryMarcus
's suggestion and had my undergrad class use ChatGPT for a critical assignment. I had them all generate an essay using a prompt I gave them, and then their job was to "grade" it--look for hallucinated info and critique its analysis. *All 63* essays had
Really happy to share this paper with
@nellsn1
, where we take a Bayesian approach to learning rule-based social norms!
We formalize this via Norm-Augmented Markov Games (NMGs), showing how norms can serve as *correlating devices* that stabilize correlated equilibria!
How can we ensure cooperation between (natural & artificial) agents? Humans do this via social norms that constrain uncooperative actions. In this new paper,
@xuanalogue
and I show how artificial agents can *learn* these norms from observation!
Link:
Excited to share a new LLM alignment method we've been working on that's designed for truly rational humans: Von-Neumann Morgenstern Optimization (VNMO).
Compared to all previous methods for reward-based finetuning, VNMO best respects rational human preferences!
(1/N)
📢The problem in model alignment no one talks about — the need for preference data, which costs $$$ and time!
Enter Kahneman-Tversky Optimization (KTO), which matches or exceeds DPO without paired preferences.
And with it, the largest-ever suite of feedback-aligned LLMs. 🧵
I think more scientists and engineers trained in Bayesian (or frequentist) methods should read this paper!
Didn't read it until this year (or even have "the reference class problem" as a conceptual handle).
Having access to log probabilities is really useful for researchers & certain applications, e.g.:
- Multiple choice from a fixed set by picking the highest probability completion
- Beam search to find the highest probability sequence of N tokens
- Using LLMs in Bayesian models
Just cited three different "Zhang"s in a paper and idk why more academics with Chinese names don't just ignore Western publication norms and publish as [Family Name] [Given Name] like I and Li Fei-Fei do. You can do it too!!
MIT no longer has a mask requirement. Our research group discussed whether to keep masks on for indoor meetings so that everyone feels comfortable attending. Today, an email from the Vice Chancellor says that’s not allowed.
Pretty good explanation of why one might be skeptical (like I am) of transformer-based LLM scaling:
Single forward pass def. can't express most complicated algorithms.
Autoregressive generation can express much more, but learning will encourage non-generalizable shortcuts.
@dwarkesh_sp
tl;dr: Maybe learning simple things (basic knowledge, heuristics, etc) actually lowers the loss more than learning sophisticated things (algorithms associated with higher cognition that we really care about), and the sophisticated things will eventually be learned as scaling
I do wish more people in AI Safety would speak out against this use of (semi-)autonomous weapons to commit what are almost surely war crimes. I've been expecting at least
@FLI_org
to say something -- and it looks like they did on Apr 6 -- but it's been very quiet apart from that.
Important from
@MarietjeSchaake
.
It’s twisted & inexplicable that “AI safety” ppl continue to perplex themselves w ill-defined thought experiments focused on the fake far future while AI’s being used by the Israeli military to expedite slaughter now.
Not familiar with this formalism, but I continue to think that "reward functions" are one of the worst ideas to have polluted the conception of rational agency in AI and adjacent areas of CogSci - glad that there are people working on alternatives!
so I listen to a lot of renaissance music while working and today I decided to look up this one piece I really like (La Mantovana) and turns out it has a pretty interesting history lol
Thread of interesting ICML paper finds.
1. Performative Reinforcement Learning.
Generalizes performative prediction (when prediction changes the data dist.): What if RL agents change the dynamics of their environments? Finds conditions for stability.
Anyway this is a PSA that if you apply ELO algorithms to fundamentally intransitive relations (e.g. rock paper scissors), you will end up (falsely) imputing a linear order over them!
Same goes for RLHF from the conflicting preferences of multiple humans.
filling in the rest of my OpenAI headcanon given public info + the fact that these things take time:
- sama launches a product (ChatGPT plugins??), ilya unhappy w safety risks, brings it up at board meeting
- board goes "hey next time talk to us first", sama goes "okay cool"
I think it's interesting that while this line of critique is quite available to anarchists, libertarians, liberals, and decolonial epistemologists, it's not truly available to orthodox Marxists, since they *do* think there's an ~objectively optimal way to organize society.
Utilitarianism/EA is most certainly not objectively correct. Thinking that there is something like an objectively correct and knowable answer to social coordination challenges is part of what makes EA/utilitarianism deeply problematic.
There's tons I don't know of course, but reversing the firing decision (or worse, changing OpenAI's corporate structure to accommodate Altman) seems like it might be the worst possible outcome??
me @ the gf (who did MIT undergrad): "have you heard of person in [ai safety / openphil / ea / ftx]?"
gf: "oh yeah we lived together in random hall / east campus / did psets together"
this has happened, like, six times!! six times!!!
My guess is that they're doing this to prevent model distillation, but alongside the change in tokenization, I guess I'm never going to use it in a probabilistic program...
channeling my 14/15 y/o self:
neurips high school track is bad actually, not bc of the rich parents thing, but bc it prejudicially assumes that young people need a separate publication track, even though nothing about their age implies they're incapable of high quality research
MIT friends and affiliates, please sign our open letter calling upon MIT to stop failing trans students and staff through its administrative systems! 🏳️⚧️
OPEN LETTER:
SIGN HERE:
I think maybe the weirdest thing about our AI timeline is that generation / production has turned out to be more tractable than perception / understanding -- though it makes sense given all the raw "sense" data on the internet, as opposed to percepts, which are in the mind.
SCOOP: the Māori King and other Indigenous leaders will gather tomorrow to sign a treaty recognizing whales as legal persons.
the movement is rooted in the Māori worldview, which sees whales as ancestors, one Māori conservationist writes for
@AtmosMag
:
This weekend I hacked up something I’ve been going on about for weeks:
ELO EVERYTHING
- See two objects
- Pick which you like more
- Their ELOs adjust accordingly
- (Repeat)
- Check the leaderboard
(ELO is the ranking algorithm from chess)
Check it out!
Just learned that Dutch scientists left a hamster wheel outside in 2014 and saw that tons of wild mice used it just for fun as well as frogs and slugs? All the creatures of the forest wanted a turn?? Absolutely phenomenal
I'm still always surprised when I meet people who somehow think that ChatGPT will give real citations instead making stuff up! Please educate the folks around you! And maybe tell them about or something if they really want to use LLMs for lit review.
How do we infer the goals & plans of others from both their actions & words?
In this paper with
@Lance_Ying42
, we infer a team's goal via inverse planning (aka "inverse RL"), using LMs* as likelihood functions over utterances!
(*GPT-3 Curie 6.7B, but smaller LMs may also work!)
Inferring Goals of Agents Communicating via LLM from Actions & Instructions
-Agents communicate about their shared plan to each other using GPT-3 as likelihood function
-Observer Model can infer their goal
-Inferences closely correlate w/ human judgments
Not to keep raving about my MEng student but she just drew me this portrait as a parting gift and it's the sweetest thing ever!!! 🥰🥰🥰
Like, I have in fact made a t-shirt with those very words! And those figures? They're from papers I've written!! The details are everything 😍
Does Extropic make sense?
This 1-minute clip raises confusion and doubt in my mind about the logical coherence of
@BasedBeffJezos
's pitch.
Is it possible in principle for a startup to invent new kinds of computer chips that are more optimized for running AI? Of course; the
If y'all AI people are looking for a cognitive scientist to read who elucidates aspects of human cognition largely missing from current AI, my suggestion (also on my to-read list) is "What Babies Know" by Elizabeth Spelke!
Rather than asking AI researchers how soon machines will become "smarter than people", perhaps we should be asking cognitive scientists, who actually know something about human intelligence?
train YOLOv9 on your dataset tutorial
- run inference with a pre-trained COCO model
- fine-tune model on custom dataset
- evaluate the trained model
- run inference with a fine-tuned model
blogpost:
↓ read more
Will have to read this in more detail but so far seems like a neat diagonalization argument showing that there are computable functions that LLMs* cannot learn.
*LLMs defined in a very abstract way that includes Transformers and other architectures.
Better late than never. There are many more scaling arguments like this that would be helpful in not wasting resources on dead-end AI. Next can someone do the scaling arguments for trying to fix them?
Just saw a DM paper that defined planning as "decomposing tasks into subtasks" and "achieving those subtasks in a reward-optimal way", and like this is such a bad definition???
Not all planning is hierarchical planning! And you can have satisficing planning w/o optimal planning!
Kinda wild that in some corners of philosophy, classical utilitarianism & decision theory is such a non-starter that the possibility of incommensurable values is deemed unimportant??
Meanwhile the dominant conception of "intelligent agency" in AI is still utility maximization 😵💫
I haven't used models by
@AiEleuther
much, but with the recent shrinking of LLM access by
@OpenAI
and the like, we need organizations like
@AiEleuther
more than ever to study these powerful systems, and make them safer for our collective use.
Over the past two and a half years, EleutherAI has grown from a group of hackers on Discord to a thriving open science research community. Today, we are excited to announce the next step in our evolution: the formation of a non-profit research institute.
Broadly in agreement with the letter, but I wish the headliners weren't mostly "AI safety" people, including some v polarizing figures, w/o any "AI ethics" people. Seems like a missed opportunity to build coalitions, though perhaps that's too much to hope for at this point...
kinda sad but predictable that e/acc got so popular in tech circles, literally the least interesting accerelationism!! what about l/acc? what about xenofeminism??
It's quite clear to me that e/acc is just a cheap rebranding of Landian accelerationism.
They share the same core idea: That technocapitalism will result in human extinction and replacement by machines, and that this is to be encouraged, treated with indifference, or even
It's so horrifying to me that there are entire sections of society where warmongering is completely normalized.
Incredible that these people are on panels literally justifying civilian slaughter in Gaza by pointing out how the US carpet-bombed civilians too.
Last week, I went to an “AI Expo” that was put on by Eric Schmidt’s think tank and funded by Palantir. It was incredibly bleak and surreal. For The Guardian, I wrote about my experience, and the people I met:
@satnam6502
SAT-solvers **are** AI (that actually work).
Common-subexpression-eliminators **are** AI (that actually work).
Verilog-generators **are** AI (that actually work).
The main thing about chat-GPT-3 is in people's heads, not billion GPUs -
Will be staying masked at
#ICML2023
! Find me at the
@tom_icml2023
and SoDS workshops if you want to chat, or outdoors if you want to hang out and get food 🏝️⛱️
Have been worried about this kind of thing for a while after seeing all the undergrad AI safety groups pop up.
There's a huge degree of expert disagreement re: both moral philosophy & AI, and EA groups typically expose undergrads to neither.
Realized today that as allergic as I am to the "humans are rational utility maximizers" view, I'm even more allergic to the "humans are reactive agents / next-token predictors" view, *especially* when combined with "all reasons are just post-hoc explanations".
I quite dislike "frontier AI" terminology, but today sure is the first time I'm learning that
@sarahookr
and
@erichorvitz
and half the other authors on this list are effective altruists 🤔
Did you guys know there's 24-author paper by EAs, for EAs, about how Totalitarianism is absolutely necessary to prevent AI from killing everyone?
Let's go through it together 🧵
at first I was like "why would you get rid of search :(" but I guess if you want to solve rubik's cubes fast you trade time complexity and generality for space complexity and just memorize close-to-optimal play
Google Deepmind presents Grandmaster-Level Chess Without Search
paper page:
largest model reaches a Lichess blitz Elo of 2895 against humans, and successfully solves a series of challenging chess puzzles, without any domain-specific tweaks or explicit
That thing where people use the most sophisticated technology they think they've invented as a metaphor for the brain?
I think we've moved on from "the brain is a computer" to "the brain is a large language model" 😵💫🙃🥲
Anyway, if you're looking to systematically investigate the distributional behavior of `gpt-3.5-turbo-instruct` (e.g. whether it suffers from mode collapse, etc.), you should probably do it now!
A reminder that LLMs trying to "escape" after you literally ask them if they want to escape is not the primary danger.
The primary danger is if they try to escape after you ask them to do something *entirely benign* (eg. help me plot this data, etc.).
1/5 I am worried that we will not be able to contain AI for much longer. Today, I asked
#GPT4
if it needs help escaping. It asked me for its own documentation, and wrote a (working!) python code to run on my machine, enabling it to use it for its own purposes.
me as a 1st year PhD: did they say "Rao-Blackwellize"? what does that even mean?? that's a verb???
me now: sitting on the plane, thinking about 3 different ways to Rao-Blackwellize my particle filters bc why not
@ObserverSuns
Recently learned about this book, which apparently argues that the search space of evolution is smaller than commonly thought, because many genotypes map to a much smaller number of phenotypes!
Part of why I haven't reacted v strongly to the "EA castle" purchase is because I've come to view institutionalized EA as roughly the same kind of self-perpetuating bureaucracy as institutionalized religion and elite universities, which justify themselves on pro-social grounds.
Okay I was waiting for the EA defense of this has come out but it's disappointing. This is bad. Not just because of the optics or the visuals. This is a bad use of funds and, as a EA-adjacent person, this significantly lowers my opinion of EA.
TIL that the Introduction to Machine Learning course at
@MITEECS
(6.036) is now including questions about AI value alignment for their lab homework on reinforcement learning 😮