what there is no overwhelming agreement on is what will happen the day after. how do we build a livable future here, for both israelis and palestinians. israel leadership is not thinking of this now, future looks grim. THIS is where protest effort and outrage should be invested.
first gen phenomena: my parents recently googled me, discovered google scholar, enrolled to get updates from my profile. and now they constantly forward me google update emails with remarks like "10 new citations! so proud of you son!!"
I was puzzled for a while as to why we need RL for LM training, rather than just using supervised instruct tuning. I now have a convincing argument, which is also reflected in a recent talk by
@johnschulman2
. I summarize it in this post:
the birth story of json is so cool.
"we needed a way to transfer data between the server and the client, and xml was terrible to parse, so we thought if we just format it as javascript, the browser will parse it for us".
>>>
the real issue w/ "understanding the math behind machine learning" is that, esp with the newer models, this will not really buy you anything. you can have a complete mechanistic understanding of everything in the process, it will not help you understand why/how these things work.
"LLM understands X"
is very different from:
"LLM can answer correctly some questions that involve X"
which is in turn very different from:
"with the right prompt we can get the LLM to answer correctly some questions that involve X"
my two cents on why NLP as a field is focusing on the ML-ish / algorithmic / leaderboard-ish aspects (incl., now, LLMs) and not on the underlying language phenomena: it is just so much easier, on so many levels.
startups: i want to combine this and that with some ui and give it free to achieve growth
VC: here is 100M cash, burn it quickly, please!
academia: i want to solve this extremely hard problem
EU: compete for our most lucrative grant, 2.5M over 5 years. and we demand exclusivity
"I am the brightest mind in AI, and also the most powerful and influential. I think my work can have disastrous consequences and end humanity. So I will keep working on it full speed, but also sign this plea urging you to be afraid and allocate some money to keep an eye on me."
frustrating aspect of natural language systems:
- system with 80% accuracy where all mistakes are obviously wrong and irrelevant --> very useful, but users lose trust.
- system with 80% accuracy where the mistakes are very hard to spot --> much less useful, but users love it.
We have a cool new algorithm for extracting automata from RNNs (LSTMs, GRUs..)
Turns out that for many simple languages, RNNs actually learns quite large and weird DFAs that have many blind-spots which our algo discovers.
(w/ Gail Weiss,
@yahave
)
another "hot take": the fact that gpt3 "can write complete react apps" and that this seems super-impressive to devs, just shows at what crappy state we are in web-based software engineering, that people keep writing the same boilerplate code all over all the time with ~0 reuse.
Just wanted to give you all a heads up, our lab found an amazing breakthrough in language understanding. but we also worry it may fall into the wrong hands. so we decided to scrap it and only publish the regular *ACL stuff instead. Big respect for the team for their great work.
OpenAI losses are at 540M? how could that be? shouldn't autoGPT with access to gpt4 be able to make it back for them in a week doing petty business scams?
הו שלום לך, סטודנט שנה שניה ששולח מייל לתפוצת כל בכירי האוניברסיטה להתלונן שהמחלקה למדעי המחשב מארגנת יותר מדי אירועים לקידום נשים, ובכך דה פקטו מדירה גברים. באת בזמן טוב.
the 175B LM release from Meta is a very welcome move, and I especially appreciate the release of the logbook: a fascinating read which really grounds what it means to be an AI Researcher / Research Engineer working at the fore-front of AI these days.
Start-of-semester thoughts:
Teaching NLP is quite depressing, and I don't know how to do it well. I am torn between the two perspectives:
1) Teach the interesting problems. Why language is interesting. Why language is hard. How is language structured. What should we look at.
I think the discontent many linguists (and old-time NLPers, and myself) feel about current research trends in "current NLP", is because that, to a large extent, what is called NLP is not really about language at all. It just uses natural language as its I/O format.
These explanation slides by Mike Collins on the transformer / self-attention building blocks are maybe the best presentation of it I’ve seen so far. Definitely stealing them for my class next year.
i want to get into ML research, what topic would you recommend?
nothing. now it is not time to get into ML research. now its time to either observe what others are doing, or to build innovative applications using established techniques, or both.
I expected the Transformer-based BERT models to be bad on syntax-sensitive dependencies, compared to LSTM-based models.
So I run a few experiments. I was mistaken, they actually perform *very well*.
More details in this tech report:
the real power differential of openai vs academia (and also vs most other industry ai labs) is not the compute. its a large dedicated team working together on a single project for a prolonged period of time.
"then (human) clients did not want to commit to something which is not a "standard", so we said ok let's make a webpage that describes it as a standard".
"we thought of JSML" but it was taken already, so we tried JSON, and the domain name was available, and that's it".
superb BERT survey by
@annargrs
, Kovaleva and
@arumshisky
. Terrific summary of the many analyses and modification papers, very useful ref and/or starting point.
@SanaSaeed
@pookleblinky
to get the facts straight: these are not areas that should be evacuated, bc this is impossible (its all of gaza strip). this is a division of gaza such that the IDF could be more specific in its warnings and not say "all of khan yunis, leave east" but "56 and 57 are not safe".
many say now "to do video gen well, the system must learn a world model and understand the physics"
but to me *the* big lesson from LLMs is how much impressive performance can be faked *without* underlying model and semantic understanding, just by mimicking observed patterns.
"semantic embeddings" are becoming increasingly popular, but "semantics" is really ill-defined. sometimes you want to search for text given a description of its content. current embedders suck at this. in this work we introduce a new embedder.
@ravfogel
@valentina__py
@AvshalomM
one thing bard is worse at than openai is instructions of the form "answer in the form of a json array without any additional content". it almost always adds at least some "friendly" prefix "sure! here is your array". should be easily fixable, but currently big edge for oai.
umm. GPT3’s cost is 12M$ only if you consider only the compute cost of a single final run, +don't count the personel costs of an 18 months project with 31 authors.
"in order to use a model for something useful, we need to understand how it works" --> no, we don't. we use humans (and birds, and dogs...) without knowing how they work.
understanding how models work is very interesting and very important. but we can certainly use them without.
Exploring the MIT Mathematics and EECS Curriculum Using Large Language Models
Presents a comprehensive dataset of 4,550 questions and solutions from all MIT EECS courses required for obtaining a degree
This effective 8bit BERT from Intel shows that we can remove a lot out of large transformer LMs, while remaining accurate.
Eagerly waiting the efficient inference code+hardware to go along with it!
#greenAI
The few-shot generalization abilities of GPT3 are truly uncanny, in particular if the system i'm playing with is indeed trained only as large-scale LM. It picks up on rather elaborate language behaviors based on 2-3 examples. The mechanism that allows this to emerge is a mystery.
common misconception, but the statement "nothing will be gained" is not true. deep linear-only networks are indeed equivalent in expressive power to a single layer linear network, but theory shows that their optimization dynamics are different and better.
Neural networks with only linear layers would result in linear models. Because a linear combination of linear functions is again linear. Nothing would be gained by adding more layers.
i just wanted to comment that many mathematicians can already do math well beyond elementary school level and even discover more math never seen before, yet we do not consider them a threat to humanity.
annoying habbit of academia which I am really trying to avoid (though it is oh so pervasive, its really hard):
talking/thinking in terms of "what's your next paper" and not "what do you want to figure out / achieve". we really are turning into publication-producing machines. sad.
this "terminal inside chatGPT" is crazy. is that an easter egg? how? i tried the prompt and it... worked? i was inside a shell. then i tweaked the prompt to be inside a python interpreter. also behaves creepily ok (see next tweet for screenshots). wtf. what game are they playing.
ML system are biased when data is biased. sure.
BUT some other ML systems are biased regardless of data.
AND creating a 100% non-biased dataset is practically impossible.
AND it was shown many times that if the data has little bias, systems *amplify it* and become more biased.
ML systems are biased when data is biased.
This face upsampling system makes everyone look white because the network was pretrained on FlickFaceHQ, which mainly contains white people pics.
Train the *exact* same system on a dataset from Senegal, and everyone will look African.
i thought the point of "education" was to actually absorb some arguments and engage with them, but apparently she really is just a non-stochastic virtue signaling parrot
The award-winning "Octopus Paper" by
@emilymbender
and
@alkoller
argues, and shows via thought experiments, that LMs trained only on text cannot acquire semantics.
We take it a step further and show *formal conditions* under which semantics can or cannot be learned from form.
unpopular opinion: with several thousands of documents, just pay people to do it manually. (10k documents at 10$ per doc, which is very high pay, is just $100k. now compare it to absurd data scientists salaries..)
Anyone with experience in OCR or AI/ML looking for a challenge to solve which would help climate science?
We have thousands of pages of historical weather observations which need numerical values extracting efficiently & accurately so we can better understand extreme weather.
i wonder what was the crime that deserves such a punishment. did she jay walk? did she steal food to feed her sick brother?
(she stabbed a 26 years old orthodox-jewish mother carrying a stroller and walking with her children, in the back, with a 10inch knife)
putting aside the fact that this will technically not work, this is the same person who funds OpenAI with their very much closed black-box algorithms, yes? also, i don't see Tesla putting its blackbox driving algorithm on github...
re this "IBM pulling out of facial recognition!!!" everyone are excited about, here's my critical (cynical?) read of what they *actually* wrote (esp the first paragraph)
lets get the terminology straight. hamas did not "take hostages". taking hostages is something that happens during a fight. no. they *raided* and *kidnapped* people from their daily lives.
huh? so when gpt4 was thought to be a really really big gpt3 people were like "WOW AMAZING" and now with the rumor of it being 8*220B mixture of experts with small inference trick they are like "Oh, Mixture of Experts? thats what you do when you are low on ideas"?
this is very well done and worth a read, both if you believe in emergent abilities and if you don't.
(personally, i tend to believe there is *one* emergent ability of scale, but it definitely exists, and its magnificent: the emergence of in-context learning behavior)
Are Emergent Abilities in Large Language Models just In-Context Learning?
Spoiler: YES 🤯
Through a series of over 1,000 experiments, we provide compelling evidence:
Our results allay safety concerns regarding latent hazardous abilities.
A🧵👇
#NLProc
after watching some tiktok, i am now certain the AI based misinformation is really not a big thing to worry about, given the human-based content we already have
a bit more on this: "oh the new large DL models in NLP are so soul-less, they only consider form and don't truly understand meaning, they are black-boxes, they expose and amplify sociatel biases in the data, etc etc etc":
Excited and delighted to announce I received an ERC starting grant in which I plan to revolutionize the way we do NLP!
[and more exciting news coming soon! Stay tuned]
question to google "we had complete academic freedom until three days ago" brain researchers: could you publish a paper on just how awesome pytorch is, and how incredibly painful tensor-flow is?
related to previous tweet, my (controversial?) take: we (CS) put too much emphasis on creating novel algorithms. this is dumb. first of all, most of them aren't really novel. second, we have so many good existing ones already. it's stupid to be allowed to use an algo only once.
@alitlstrawberry
@RattusFlattus
i dont think she ever presented herself as a person of color, though? it was you who assumed her to be chinese based on her last name
my main takeaway from conversations at ACL: our datasets are crap.
I am not talking about MRC, which goes without saying but also improving. I'm talking about real-world stuff: NER, IE, document classification.
What we have is toyish and unrealistic.
How do we get good ones?
can you explain LangChain to me? from the examples it seems that you have to learn a kinda-big api with many concepts, in order to replace straightforward, short and simple code. what am i missing here? what does it save?
Hello colleagues and fellows. Over the past few days I was shocked to learn that people in our community don't share what I consider to be basic human values. Please help me restore faith in our community by signing this.
i am sorry but identifying a single line diff between two text files "in only 22 seconds" is *not* a compelling use case for large context window LLMs.
that's a great question. and going to be a long-ish answer.
first, i had that thread mostly to provoke people to point me to cool work. i got some, but hey do send some more.
second, i do really feel somewhat bored/frustrated with a lot of what's going on.
what is DALL-E2 trained on? i couldn't find this info in the paper or webpage beyond "images and their captions from the internet", not even the dataset size.. anyone knows and can share?
model outputs something extremely offensive: oh its just reflecting the patterns and biases in its training data. its not the model's fault its society and the internet.
model does something reasonable and useful: its a miracle! this thing is intelligent and thinks on its own!
day1: i have an idea!
day2: i implemented my idea and added it to the NN and it improved 10 points!
day3: oops i had a bug and "my idea" was turned off when i achieved this gain, it was just hyper-param
how many times did this happen to you?
how many times you didn't reach day3?
with all the recent attention to "long context", Mosh and Alon went to find out "does it really work?" and in particular in the case where you want to not only *locate* separate pieces of information in the context, but to also *reason* over them. Turns out, not really.
Two needles in a haystack: Our latest study explores how LLMs perform on the same task with different lengths of context.
Accuracy dips when models must not only find, but reason over two text parts. Even on just 3000 tokens!
Results and analysis 👇(1/7)
The "controversial google ethics paper" is now officially out. This version is significantly better than the leaked one, imo. Kudos to the authors for this improvement. Read it.
However, two things in the paper rub me the wrong way. Here is my criticism:
Since
@asaf_amr
defended his masters yesterday, we decided its a good time to arxiv this ICLR 2020 reject.
It presents a *simple* constructive proof of the benefit of depth in neural nets, which, unlike other similar works, can be grasped by undergrads.
It is very tempting to read into the "grandmaster chess without search" result way beyond what's actually there. I was also mislead on first read, but further reading and reflection brought me down to earth. I wrote a bit about it.
Conjecture: Only AI problems that you can simulate and sample endlessly many training samples for can be solved with today's algorithms such as deep and reinforcement learning.
Prediction: Any AI problem that you can simulate and sample endlessly many training samples for can be solved with today's algorithms such as deep and reinforcement learning.
my first reaction was "ok, i will not post any more links" (i posted the two videos of our own works already, so...)
but then i thought, fuck this shit.
here are all the eacl 2021 papers + video links.
explore and enjoy, there are some great works!
Zuckerberg: here is our vision for building a consumption-centered dystopian metaverse future.
the internet: OMG "facebook" are changing their name to "meta"!!!1
so, two cents on the gzip classification thing: apparently the idea has been around in some form of another for a while. but was treated as a curiosity because how inefficient it was compared to all other classification methods. enter BERT.
turns out that if you carefully enough train a large enough neural net on huge amounts of data (any data), the resulting model will be able to do some really impressive stuff.
too bad everything about the process of doing so is so mind-numblingly boring.
no, they really CANNOT replace knowledge bases yet. they recover correctly only a fraction of the facts in the KB, are restricted to single tokens, and when they don't know the answer they just make something up.