
Khanh Nguyen
@khanhxuannguyen
Followers
2K
Following
847
Media
87
Statuses
1K
Postdoc at CHAI Berkeley with Prof. Stuart Russell, Prev. Postdoc at Princeton NLP, PhD @umdcs, Human-AI Communication, Interactive Learning, NLP.
Joined September 2014
I have finally graduated, thanks to tremendous support from my research mentors (@haldaume3 @brendan642 @debadeepta, Dipendra Misra, and others), my family, and friends. I will be a postdoc @princeton_nlp and later @CHAI_Berkeley. Looking for opportunities to give talks :P.
14
5
80
Our HANNA paper on Visual Navigation with Natural Multimodal Assistance has been accepted to #emnlp2019. New task/dataset/model/learning algorithm for leveraging vision-and-language human assistance in object-finding tasks in photo-realistic environments! (with @haldaume3)
2
14
62
This great work confirms my intuition: people have rediscovered problems of RLHF that was observed and documented many years ago when the method was first tried on machine translation. The finding in this paper is similar to People, especially.
"Less (tuning) is more for alignment" is an intriguing hypothesis. Is alignment tuning really that “superficial”⁉️ 🤔 If so, how so? 🤔 Can any straightforward analysis explain this? 🤔 What if I tell you “no tuning can also be great for alignment”? 🫢 😉 If you’re interested in
2
8
53
Maybe it's time to move beyond rewards and start 𝘁𝗮𝗹𝗸𝗶𝗻𝗴 properly to our ML agents! .Our ILIAD #ICML2021 paper formulates a learning framework where natural language is the only communication medium used by the teacher. Blog:
1
20
50
Happy to introduce 𝗚𝗹𝗼𝗯𝗮𝗹 𝗩𝗼𝗶𝗰𝗲𝘀, an evaluation dataset for multilingual and cross-lingual summarization in 15 languages (w. @haldaume3). New materials for studying translation quality in downstream task, zero-shot learning, etc. #NLProc #summarization #multilingual
1
9
47
Passing false-belief tests = model HAS theory of mind .Passing false-belief tests ≠ model USES theory of mind to perform tasks .Our #ACL2023 paper: formulates 𝑻𝒂𝒔𝒌-𝑶𝒓𝒊𝒆𝒏𝒕𝒆𝒅 cognitive capabilities, which are used to perform tasks.
1
10
45
Very delighted to receive an Outstanding paper award at @tom_icml2023. It is a great honor to be acknowledged by experts in the domain you have just recently ventured into :).
Passing false-belief tests = model HAS theory of mind .Passing false-belief tests ≠ model USES theory of mind to perform tasks .Our #ACL2023 paper: formulates 𝑻𝒂𝒔𝒌-𝑶𝒓𝒊𝒆𝒏𝒕𝒆𝒅 cognitive capabilities, which are used to perform tasks.
4
7
40
When a language model guides a human, giving false instructions can frustrate them or even put them in danger. We propose a cost-effective method for detecting hallucinations in navigation instructions. More about our #EMNLP2023 findings paper⬇️ (1/n)
2
5
27
Do language-to-world models like OpenAI SORA excite you? We are too! In this recent paper, we lay out a vision for this type of models. Not just video-creating tool, they will enable humans to collaborate safely and control AI easily. The code has been released. Check it out!.
📢 Excited to announce our new paper . Language-guided world models: A model-based approach to AI control. • We develop LWMs: world models that can read texts to capture new environment dynamics.• These models enable humans to efficiently control agents by providing language
0
2
20
First time co-organize a workshop at a major conference. Great interactive audience, wonderful talks and discussions about #interactiveNLP. Simultaneous interpretation still awkward. Everyone seemed to be happy. Thank you all for contributing to this experience :D
1
2
14
This is great! It might imply that we have been doing actor-critic the wrong way the whole time? Actor critic seems like coordiante descent but the problem is that the coordinates are correlated?.
🔥Major Breakthrough in #RLHF! Traditional approaches fall short in characterizing policy-driven data dependency. Introducing PARL: a Unified Stochastic Bilevel Formulation. One of the FIRST provable solutions to #Alignment. 🚀 Essential for ethical AI! 📄
3
2
10
Model uncertainty must be correct to be useful. Posterior calibration for NLP models http://t.co/5NJUQXhxWE
#nlproc #mlearning #datascience.
2
4
9
@umdclip @ml_umd @umdcs students presenting their work at EMNLP'19 in Hong Kong. A memorable event: first EMNLP paper for @swetaagrawal20 and last for @yogarshi and Weiwei Yang as PhD candidates.
1
3
7
@aahmadian_ @chriscremer_ @mgalle @mziizm @KreutzerJulia @ahmetustun89 @sarahookr this is the comparison I have been looking for! in fact, all of the early work on RLHF for text generation employed simple algorithms like A2C and REINFORCE and they worked fine.
1
0
8
By the way, @a1zhang is on the PhD market this year. He is smart, diligent, and productive, and is experienced with vision&language research. Grab him while you can 😃.
1
2
9
@fhuszar “Enough” does not mean “efficient”. A two-layer neural network with sufficient width can approximate any function. But the width could grow exponentially with the complexity of the function. Deep nets are more efficient function appriximators.
0
1
8
Come to our #emnlp2017 poster at 10.30am today (Sep 10 GMT+2) on Reinforcement Learning for Neural MT with Simulated Ratings. #nlproc
1
1
7
@QuanquanGu @iampanxu Indeed, all the early RLHF papers on text generation use REINFORCE and A2C.
0
0
8
@StephenLCasper Nice survey but missing key citations. Please see this tweet for a deeper history of RLHF
The RLHF page of HuggingFace ( misses many important citations. Here are some classical RLHF papers that you should cite and why.
2
0
6
Finally had time to write some introduction about my research on calibration #NLP #calibration #machinelearning.
0
2
4
@ShunyuYao12 @tedsumers @karthik_r_n @cocosci_lab @princeton_nlp @PrincetonCS Share many of the opinions <3 In I was also thinking of a two-system architecture because inference with the rigorous reasoning could be slow.
0
0
6
The discussion on VLN reminds me of our motivation for creating VLNA (. The first thing we changed was to replace initial detailed instructions with high-level instructions, essentially removing the assumption that the requester knows the task solutions. .
The need for open data & benchmarks in modern ML research has led to an outpouring of #NLProc data creation. But @harm_devries, @DBahdanau & I suggest the low ecological validity of most of this data undermines the resulting research. Comments welcome!
1
2
6
@DrJimFan We did Sora+Genie but at a much more humble scale :p Still we realize that the problem of grounding language to dynamics is extremely difficult. With immense data, maybe you will generalize in distribution well, but achieving true compositional.
0
1
5
@DrJimFan @yoavgo @johnschulman2 yeah, the (learned) reward function may be still imperfect but the (unconfirmed) hypothesis is that evaluation is easier than generation so the reward function may still be of higher quality than a policy learned with the same amount of labeling effort.
2
0
6
The theoretical fact that RL = Reverse KL optimization is pretty well-known and has been re-discovered multiple times (e.g., .
Why RL-tuning hurts calibration of LLMs? RL objective can be written as a reverse KL divergence which encourages mode-seeking behavior (i.e. peaky distribution). RL+translation has studied this phenomenon a long time ago ( .
0
2
5
@yoavgo @johnschulman2 i think the viewing of llm having a fixed knowledge graph is slightly misleading, by instruct-tune you also add knowledge and modify the knowledge graph. the issue to me is overgeneralization: instead of learning just the taught knowledge, llm also learns hallucination behavior.
2
0
5