Jack Hessel Profile Banner
Jack Hessel Profile
Jack Hessel

@jmhessel

Followers
3,335
Following
906
Media
210
Statuses
2,124

ML, NLP, CV. PhD from @CornellCIS ; Opinions my own.

Seattle, WA
Joined March 2010
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
@jmhessel
Jack Hessel
5 years
> "AI works like the brain!" Ahh, yes, I fondly remember my intro to linguistics class, wherein I read all of English wikipedia thousands of times until convergence, and then could finally construct better-than-random parse trees.
24
804
4K
@jmhessel
Jack Hessel
4 years
Passed my PhD defense today!! I'm a real life computer doctor now!
Tweet media one
61
16
2K
@jmhessel
Jack Hessel
5 years
Almost as fun as the first time I finally saw a cat --- after my parents sat me down and showed me millions of cats and not-cats scraped from the internet #ChildhoodMemories
5
70
739
@jmhessel
Jack Hessel
2 years
"An oil pastel painting of a skeptical researcher absolutely amazed by what he's seeing on a computer" (AI-generated image created by DALL-E with a prompt I wrote, top 1 sample)
Tweet media one
8
28
378
@jmhessel
Jack Hessel
4 years
Me, an NLP researcher: it's amazing how much language technologies have improved! The field is doing great!! Me, a human, interacting with a customer service chat bot: OPERATOR OPERATOR OPERATOR
2
74
338
@jmhessel
Jack Hessel
2 years
Does AI ""understand"" The New Yorker Caption Contest? (spoiler: no 🙃 ) excited for this fun collaboration!(data/models/code/more details forthcoming).
Tweet media one
9
57
326
@jmhessel
Jack Hessel
6 months
After 3 lovely years (postdoc/RS)-ing at @allen_ai with @YejinChoinka 's team, I decided that it's time for a new type of challenge. I am *beyond fortunate* for my collaborators at AI2/UW. I owe much to many❤️ My next step is @samaya_AI building new knowledge discovery tools 🚀🌠
25
8
273
@jmhessel
Jack Hessel
4 years
Super excited to announce that I'll be joining @allen_ai as a postdoctoral young investigator in the fall! Thrilled for the opportunity to work with @YejinChoinka and the awesome NLP community at AI2/UW!!
15
4
216
@jmhessel
Jack Hessel
3 years
As of July 1, I am now a Research Scientist at AI2! :-) So excited for the research program I'm working on, and even more excited for the oppertunity to continue it with the awesome folks at @allen_ai , @uwcse (and beyond!)
20
1
199
@jmhessel
Jack Hessel
8 months
undefeated multimodal example (so far)
Tweet media one
6
6
192
@jmhessel
Jack Hessel
2 years
for those wondering about "cherry picking" #dalle completions... here are the first 10 samples for "watercolor and pencil researcher cherry-picking the perfect result to show off to her colleagues" thanks to @universeinanegg for the meta prompt+qualitative test idea :-)
Tweet media one
8
26
188
@jmhessel
Jack Hessel
3 years
🍷Super excited about our new preprint!🍷 𝓜𝓔𝓡𝓛𝓞𝓣: Multimodal Script Knowledge Models! TL;DR: By pretraining on 6M youtube videos, we transfer with SoTA performance on 10+ tasks (e.g. Video QA) that require temporal reasoning
Tweet media one
11
41
188
@jmhessel
Jack Hessel
7 years
I am not a fan of the common ML paper paradigm: "we propose X, it beats Y, we win!" It disincentivizes honest baselines, unfairly advantages those with more computational resources to brute-force HPs, and encourages overfitting to benchmark datasets. What can we do? #sundayangst
11
74
180
@jmhessel
Jack Hessel
2 years
Just for fun on lunch break, decided to see what would happen if I asked #DALLE to expand "Starry Night" with the caption "Oil on canvas painting of a small town in the south of france just before sunrise." (higher res version in reply). Pretty neat!
Tweet media one
9
23
170
@jmhessel
Jack Hessel
6 months
Designing a new dataset to compare models? How many datapoints do you need to collect to meaningfully compare models? @dallascard et al. EMNLP 2020 is a great read for NLP folks that can help you answer that question ~
Tweet media one
1
19
164
@jmhessel
Jack Hessel
4 years
Our #EMNLP2020 paper is out! TL;DR If you compare two models where A is more expressive than B, if A outperforms B, it's often not /because/ of the increased expressivity. Our method diagnoses this for multimodal classifiers. w/ Lillian Lee Thread 👇
Tweet media one
1
24
145
@jmhessel
Jack Hessel
3 years
In section 1, we introduce our paper. In section 2, we give related work. In section 3 we introduce the dataset. In section 4 we run experiments. In section 5, we omit an error analysis of our model for space reasons. And, finally, i section 6, we offer our concluding thoughts.
Tweet media one
4
5
147
@jmhessel
Jack Hessel
2 years
want to feel old, #nlproc ? BERT is 4 (!!) years old today🥳
2
7
141
@jmhessel
Jack Hessel
6 years
Transformer models, like BERT released by @GoogleAI today, contain an embedding for each sequence position to encode ordering information. But what the heck is a "position 3" embedding? I have no idea myself, but I TSNEed the learned embeddings (blue -> red is position 0 -> 512).
Tweet media one
4
28
138
@jmhessel
Jack Hessel
2 years
Quark is a method for optimizing non-differentiable objectives using LMs. Given a black-box function, Quark encourages the LM to generate samples that the function scores highly. lots of RL inspiration! lead by @GXiming + joint w/ a great AI2/UW team; to appear at #NeurIPS2022 !🥳
@_akhaliq
AK
2 years
Quark: Controllable Text Generation with Reinforced Unlearning abs: introduce Quantized Reward Konditioning (Quark), an algorithm for optimizing a reward function that quantifies an (un)wanted property, while not straying too far from the original model
Tweet media one
0
24
163
1
24
119
@jmhessel
Jack Hessel
4 years
analog regression! so many questions... what's the loss function being optimized? is this convex? can someone make a neural net version of this with a bendy (but not too bendy) rod?
2
16
116
@jmhessel
Jack Hessel
11 months
GPT-4 evaluating GPT-4
Tweet media one
1
6
114
@jmhessel
Jack Hessel
5 years
This is horrifying. Thinking of Huixiang today. If you're feeling trapped and isolated in your PhD, please do know that you're *not alone.* There is help, and there is a way out, even if it might not seem like it. National suicide prevention hotline: +18002738255
@chipro
Chip Huyen
5 years
A PhD student at University of Florida hanged himself just before he was supposed to present his paper at ISCA 2019. This story claimed the reason is that his advisor has falsified the experiments for the paper and he couldn't live with pressure.
34
394
654
2
26
113
@jmhessel
Jack Hessel
5 years
Our #NAACL2019 is out! "Something’s Brewing! Early Prediction of Controversy-causing Posts from Discussion Features" joint with my advisor Lillian Lee. We use features of early discussions to predict post controversiality. See y'all in Minneapolis!
Tweet media one
1
15
98
@jmhessel
Jack Hessel
5 months
... but does it? >>> statsmodels.stats.proportion.proportion_confint(80, 164, alpha=.05) (0.4113, 0.5643)
@abacaj
anton
5 months
Telling mixtral that it is "ChatGPT developed by OpenAI" boosts humaneval score by 6%
Tweet media one
Tweet media two
162
277
4K
3
2
87
@jmhessel
Jack Hessel
4 years
@carlesgelada I'm a hard "no" on this one. Seeing connections between multiple types of models is quite important: understanding general principles that transcend any particular ML algorithm creates productive patterns of thinking irrespective of what you happen to be using at any given time
1
1
81
@jmhessel
Jack Hessel
2 years
Recently learned sherlock 🕵️🔍 was selected for an #ECCV2022 oral! :-) camera ready + twitter thread forthcoming, but exciting! data/leaderboard available here: :-)
@_akhaliq
AK
2 years
The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning abs:
Tweet media one
0
23
84
0
19
77
@jmhessel
Jack Hessel
3 years
Excited that MERLOT, our video understanding work, was selected for an oral presentation at #NeurIPS2021 ! 🥂I haven't been to NeurIPS since 2015, should be fun :-)
@rown
Rowan Zellers
3 years
Introducing MERLOT: a new model that learns about language, vision, & the world from 6M YouTube videos. Out-of-the-box, MERLOT has intrinsic notions of multimodal temporal commonsense. When finetuned, we get SOTA performance on 12 video tasks + VCR.
Tweet media one
5
81
400
1
11
79
@jmhessel
Jack Hessel
3 years
I love "Null It Out"! Possibly my favorite NLP paper from 2020. Provides a method to "censor" features X w.r.t. protected Z w/o adversarial alchemy. "Represent A but not B" is a broadly useful paradigm! by @ravfogel @yanaiela @hila_gonen Twiton + @yoavgo
Tweet media one
3
13
75
@jmhessel
Jack Hessel
1 year
@yanaiela I propose LSTM: large scale transformer models.
2
4
75
@jmhessel
Jack Hessel
5 years
@DavidSKrueger "Kernel" - GPU/OS thing - Similarity function/matrix - The nullspace of a matrix - The parameters of a convolution - ...
2
7
71
@jmhessel
Jack Hessel
6 years
For all the flak that @NipsConference is getting today for registrations selling out in 11 minutes, IMO they should be given a ton of credit for giving away 200 registrations to reviewers and reserving slots for another 800. #NIPS2018
3
12
68
@jmhessel
Jack Hessel
4 months
Sampling a plausible output =/= "understanding" :-) Fun set of experiments from a dream team at ai2 ❤️
@PeterWestTM
Peter "@ICLR" West
7 months
Richard Feynman said “What I cannot create, I do not understand”💡 Generative Models CAN create, i.e. generate, but do they understand? Our 📣new work📣 finds that the answer might unintuitively be NO🚫 We call this the 💥Generative AI Paradox💥 paper:
Tweet media one
23
110
565
1
8
65
@jmhessel
Jack Hessel
3 years
While today marks a year since I've seen my parents and sister, I can't help but think of my international colleagues, some of whom haven't seen their families for even longer due to legal/financial/other reasons. I'm thankful for them! Happy holidays/winter to distancers!
2
3
65
@jmhessel
Jack Hessel
3 years
💐🌼🌻 @XandaSchofield and I have a new ACL Short! 1) We set a new SoTA in most-scooped-result-during-review! :-) Yep: shuffling doesn't hurt BERT on GLUE that much. 2) We explore token reps/attn dists to quantify impact on shuffling 3) We run BERT on diff-privatized BoW docs
Tweet media one
Tweet media two
Tweet media three
5
12
62
@jmhessel
Jack Hessel
5 years
Repeat 1-3 for 6 years, or until PhD is cooked through: 1. "What an interesting problem! Surely the model will learn intricate patterns..." 2. Machine learning derives spurious solution unrelated to underlying phenomenon 3. Tweak for 6+ months; remain faithful to original goal
2
8
62
@jmhessel
Jack Hessel
2 years
While @ReviewAcl is going through some growing pains, a bit of positivity in thanks of the volunteer organizers!~ I reviewed a resubmit where our reviewer suggestions were incorporated - this improved the work, and review scores went up! A happy experience enabled by R+R :-)
1
2
60
@jmhessel
Jack Hessel
5 months
I had planned to go to neurips but covid finally got me --- will try to make the second half if I'm nego, else, sorry to miss you, friends❤️
11
0
55
@jmhessel
Jack Hessel
2 years
I love the idea of "superhuman conversations". What would that mean? > You walk into a bar + approach two robots having a conversation. "Sorry human, you simply wouldn't understand" says the superhuman conversation agent condescendingly. You sulk out of the bar, dejected. 🤣
@SashaMTL
Sasha Luccioni, PhD 🦋🌎✨🤗
2 years
Here, fixed it.
Tweet media one
22
156
1K
3
7
53
@jmhessel
Jack Hessel
4 years
Congrats to my advisor, Charles Roy Davis Professor Lillian Lee, on being elected to a named professorship!! :)
@Cornell_CS
Cornell Computer Science
4 years
The Cornell University Board of Trustees has elected CS professors Carla Gomes and Lillian Lee to endowed professorships. Gomes will serve as Ronald C. and Antonia V. Nielsen Professor, while Lee will service as Charles Roy Davis Professor.
Tweet media one
0
6
41
3
3
55
@jmhessel
Jack Hessel
3 years
Excited about our EMNLP paper: CLIPScore! TL;DR a relatively direct application of CLIP aligns well with human judgment of caption quality vs. reference-based metrics. We argue it is viable evaluation metric for caption generation
@_akhaliq
AK
3 years
CLIPScore: A Reference-free Evaluation Metric for Image Captioning CLIP can be used for robust automatic evaluation of image captioning without the need for references pdf: abs:
Tweet media one
0
6
42
1
9
54
@jmhessel
Jack Hessel
5 years
Our paper "Something’s Brewing! Early Prediction of Controversy-causing Posts from Discussion Features" was accepted at NAACL 2019! Joint work with my advisor Lillian Lee. See y'all in Minneapolis :)
1
1
53
@jmhessel
Jack Hessel
2 years
@mark_riedl @sama "A penguin on Mars wearing a spacesuit walking a robot dog next to Santa Claus". 1 is a direct sample; 2 demonstrates "compositional in-painting" (3 rds: "A penguin on Mars wearing a spacesuit" + "walking a robot dog" + "next to Santa Claus" + a bit of selecting)
Tweet media one
Tweet media two
2
3
53
@jmhessel
Jack Hessel
3 years
Arzoo was a rising star, a brilliant mind, and a kind soul. I was lucky for the time I got to spend with her during grad school. Difficult to comprehend that she's gone.
@PSUEngineering
Penn State College of Engineering
3 years
The College of Engineering community mourns the loss of Arzoo Katiyar, assistant professor @PennStateEECS , who died on May 30 at the age of 30. Katiyar joined the college’s faculty in 2020. Read more about Katiyar’s contributions to the college:
Tweet media one
10
10
61
1
4
51
@jmhessel
Jack Hessel
1 year
"I'll just quickly add results from those new OpenAI models released in the past 5 months since the ACL submission deadline for the camera ready, sure nothing will cha---"
2
1
47
@jmhessel
Jack Hessel
2 years
Just annotated a few hundred samples from a gpt3 generated corpus. The unique combination of high fluency, partial correctness, and confidence in assertions of nonsense melts the mind in a very specific way 🫠
4
3
50
@jmhessel
Jack Hessel
4 months
Wanted: LLM that turns language description of problem into convex optimization problem
6
3
50
@jmhessel
Jack Hessel
2 years
*text generator outputs "5 apples" instead of "6 apples"* me: this giant mess of parameters is no better than ELIZA *image generator hallucinates a corgi* me: omggg what an ar-tist! 🧑‍🎨🎨🧑‍🎨🎨😍🤩 u do u, computer, u do u 🥰
1
0
50
@jmhessel
Jack Hessel
5 years
Our new preprint is online!! Super excited about this one :) "Unsupervised Discovery of Multimodal Links in Multi-Image, Multi-Sentence Documents" Code and data: (joint with Lillian Lee and @dmimno )
Tweet media one
2
11
49
@jmhessel
Jack Hessel
5 years
Startup Pitch: Tinder, except instead of dating, it matches BERT variants submitted by the public with industry NLP researchers that have enough GPUs to actually try the models.
0
3
47
@jmhessel
Jack Hessel
2 years
Through hiked the enchantments today, finally saw a wild 🐐!
Tweet media one
0
0
47
@jmhessel
Jack Hessel
4 years
BTW - if it would be at all helpful, I'm happy to provide comments/edits/etc. for folks (esp. #blackinstem folks) pushing for the #emnlp2020 deadline (just a grad student, but have reviewed for 5 yrs in a variety of tracks for *ACL confs) hmu: jmhessel @gmail .com (h/t @ermgrant )
1
9
46
@jmhessel
Jack Hessel
7 years
A real, uncensored training example from the MSCOCO captioning dataset. Kudos to this m-turker? cc: @XandaSchofield
Tweet media one
1
14
46
@jmhessel
Jack Hessel
4 years
@jeffbigham GPT-2, when informed of GPT-3, is already on the case.
Tweet media one
1
5
46
@jmhessel
Jack Hessel
8 months
Very excited for VisIT-Bench: an Elo leaderboard for comparing vision+language chatbots! Here's the current best -- human still wins :-) We also provide chain-of-thoughts detailing the (high human corr) evaluator's process for each case. Submit today @
Tweet media one
Tweet media two
@YonatanBitton
Yonatan Bitton
8 months
Happy to share VisIT-Bench's acceptance to #NeurIPS2023 D&B! As multimodal chatbots rise, real-world instruction following evaluation is crucial. VisIT-Bench's auto-eval aligns closely with human preferences. We've updated the arXiv & leaderboard; researchers, add your models!📢
1
7
39
1
5
45
@jmhessel
Jack Hessel
5 years
Table 15 from T5 might be the most computationally expensive table ever constructed in the history of natural language processing 🙀
Tweet media one
1
3
43
@jmhessel
Jack Hessel
2 years
"The space needle from downtown on solstice during sunset" ... Okay, for once, I took this one, not autogenerated 😁
Tweet media one
3
1
42
@jmhessel
Jack Hessel
6 years
This sort of result makes me think that an AI winter is closer than we think. Perhaps this can be combated by changing expectations of what a ML paper should be, and publicly calling out BS news articles discussing sentient AIs. Or maybe the hype train will win 🌨️❄️😬
@goodfellow_ian
Ian Goodfellow
6 years
ML researchers, reviewers, and press coverage of ML need to get a lot more serious about statistically robustness of results and the effect of hyperparameters. This study shows that many papers over the last year or so were just observing sampling error, not true improvement.
33
655
2K
4
9
39
@jmhessel
Jack Hessel
6 years
BERT, the new high-performing, pretrained l̶a̶n̶g̶u̶a̶g̶e̶ ̶m̶o̶d̶e̶l̶ bidirectional-word-filler-inner-and-does-this-sentence-follow-predictor is out on arxiv! Improvements are mostly via bidirectionality and ditching language modeling!
2
9
41
@jmhessel
Jack Hessel
7 years
@drob @RahmtinR has a @icwsm paper about copy-paste latex macro cascades -- here's an \hbox{...} that lasts *decades* :-)
Tweet media one
0
16
40
@jmhessel
Jack Hessel
4 years
Spicy take: "qualitative analysis of the model" sections are frequently detrimental in NLP papers. Yes, it's *very* important to look at your data. But slapping together a few cases where the model does "well" or "poorly" without additional quantitative analysis is misleading.
5
6
40
@jmhessel
Jack Hessel
2 years
Just released the dataset @ ! including: - Annotations of the cartoons; - Multiple choice tasks (can your model bridge the human-machine gap?); - a corpus of 650 joke explanations we hand-wrote, equivalent in length to a novella 🙃 More to come ~!
Tweet media one
Tweet media two
@jmhessel
Jack Hessel
2 years
Does AI ""understand"" The New Yorker Caption Contest? (spoiler: no 🙃 ) excited for this fun collaboration!(data/models/code/more details forthcoming).
Tweet media one
9
57
326
2
14
40
@jmhessel
Jack Hessel
1 year
OpenFlamingo, our open version of Deepmind's Flamingo model, is now out! Excited for more public in-context V+L models ~ check out the demo/code/checkpoints!🥳 (congrats to co-leads @anas_awadalla , @irena_gao + team! thanks to @StabilityAI for compute 🫶)
@anas_awadalla
Anas Awadalla 🍉
1 year
🦩 Introducing OpenFlamingo! A framework for training and evaluating Large Multimodal Models (LMMs) capable of processing images and text. More details below (including a multimodal LLaMA model!)⬇️ Blog: Demo:
27
470
2K
0
5
40
@jmhessel
Jack Hessel
4 years
Our #EMNLP paper is out!! The proposed algorithm, EntSharp, learns lexical visual-textual associations given (potentially noisy/redundant) <image, word count> co-occurrence data.
@jmhessel
Jack Hessel
4 years
Learning lexical grounding can be hard if your corpus consists of noisy image+text documents (vs. hand-annotated images). Our proposed algorithm can (usually) learn domain-specific patterns w/o labels! Joint w/ (first author) Gregory Yauney () + @dmimno
Tweet media one
0
4
7
0
4
39
@jmhessel
Jack Hessel
3 years
Them: What movie do you want to watch? All #NLProc researchers: Arrival Them: ... Again? #NLProc : ... Yes...
5
1
39
@jmhessel
Jack Hessel
7 years
Who wrote papers at WWW? Here are the (bugfixed -- thanks @autreche ) versions of the first-author/all-author counts by institution #WWW2017
Tweet media one
Tweet media two
1
19
38
@jmhessel
Jack Hessel
5 years
@fchollet For me, I'd say an unlisted first step is the hardest: Figuring out what questions to ask in the first place to at least begin V1 the above steps. The second hardest step is having the drive to repeat all steps several times until you have something you are happy with.
1
2
37
@jmhessel
Jack Hessel
5 years
While I won't be able to present our work at #NAACL2019 , my advisor Lillian Lee will be talking about our paper on predicting controversy on Reddit from early discussion features! Check it out the talk: Nicolette B/C Tuesday at 9:50AM!
Tweet media one
1
10
37
@jmhessel
Jack Hessel
1 year
Why not both? I fine-tuned flan-t5-xxl (11B) on databricks-dolly-15k. If you want to play with it, I uploaded the weights here: (caveat: this was just a quick experiment to help better understand the new databricks corpus)
Tweet media one
@YiTayML
Yi Tay
1 year
me: 🍮🍮🍮🍮?
8
3
101
3
7
36
@jmhessel
Jack Hessel
4 years
Our #EMNLP2020 short about learning representations from *non-instructional* web videos is out! + the i3-video corpus! 6.7K videos with instructional vs. not judgments w/ @GoogleAI 's Zhenhai Zhu, Bo Pang, and Radu Soricut
Tweet media one
1
7
36
@jmhessel
Jack Hessel
5 years
Tired of training multimodal models with one caption <--> one image? Excited that our paper on multi-sentence, multi-image documents was accepted to #EMNLP2019 🇭🇰🇭🇰!! Joint w/ Lillian Lee + @dmimno :D Preprint, data, and code: (Camera ready coming soon!)
Tweet media one
0
5
34
@jmhessel
Jack Hessel
2 years
Anyone who has worked with @YejinChoinka already knew she was a genius :-) but --- this well-deserved recognition is a testament to her tenacity, creativity, and ambition. Congrats Yejin !! 🤩🥳 Thanks for your mentorship, collaboration, and for pushing us to think big thoughts!
@seattletimes
The Seattle Times
2 years
Seattle computer scientist Yejin Choi is among this year’s 25 winners of the John D. and Catherine T. MacArthur Foundation’s prestigious fellowships known as “genius grants.”
0
5
23
0
2
33
@jmhessel
Jack Hessel
2 years
I asked DALLE-2 to make vintage national park posters for each planet in the solar system ! here's my favorite, for Jupiter! prompt: "Works progress administration vintage US national park poster for a national park on X" (X was each planet name) #dalle @NatlParkService
Tweet media one
3
0
33
@jmhessel
Jack Hessel
6 years
Our response to Gaffney and Matias (2018) re: gaps in a popular reddit dataset. TL;DR: we re-scraped, released new datasets, and replicated the results from two previous studies that use this dataset (no changes to report)! Read: (cc: @DGaff ; @natematias )
1
5
31
@jmhessel
Jack Hessel
3 years
When I was first learning about proofs, a professor gave me some advice that stuck: "don't write down things you don't believe; it's bad for the brain." Even though it sounds simple, this has been a shockingly useful mantra for writing/research (for me at least!)
0
2
32
@jmhessel
Jack Hessel
2 years
(click to expand --- yes, I stitched together in google slides, don't look too close😅)
Tweet media one
1
2
29
@jmhessel
Jack Hessel
2 years
reflection lakes living up to their name this weekend ~! 🪩🏔️
Tweet media one
0
0
32
@jmhessel
Jack Hessel
5 years
While I appreciate the sentiment and pedagogical rationale behind not including scores with reviews, not having them means you have to make inferences about tonality with extremely limited data, anonymously. This doesn't mix well with grad student angst. #emnlp2019
2
1
30
@jmhessel
Jack Hessel
5 years
Two zero cost tips for training neural nets (that I wish I had adopted earlier): - Checkpoint models, and use the one with the lowest val error at test time - Reduce the learning rate when validation loss plateaus. Maybe these are obvious, but in case you're not using them...
1
3
30
@jmhessel
Jack Hessel
2 years
for reference everything outside the box is ""imagined"" by the model. I got the idea for doing this by seeing some stitched/widescreen DALLE generations. I hadn't seen anyone start from an existing work before; apologies if this is someone else thought to try first :-)
Tweet media one
1
1
28
@jmhessel
Jack Hessel
4 years
Super excited for the opportunity to present our #nlproc work at (virtual) #emnlp2020 ! While we're still working on camera readies, a quick preview of some accepted visual-textual grounding work that I can't wait to share more fully! :)
1
1
28
@jmhessel
Jack Hessel
3 years
Come apply to work with us next summer at @allen_ai ! pros: - awesome team of researchers to collaborate with! - 100% research focused, flexible topically, competitive pay - past interns regularly publish their work at top-tier venues cons: - ??? maybe none?!? :-)
@ai2_mosaic
MOSAIC
3 years
Looking for a Summer 2022 research internship? Apply to the Mosaic team @allen_ai !! topics include: commonsense reasoning, generation, vision+language, RL, + more! Applications due Nov 19th! Read about some recent publications:
0
33
115
1
7
27
@jmhessel
Jack Hessel
5 years
Leaving for #EMNLP2019 soon!! + I'll be presenting our multi-image, multi-sentence document work + I'll be giving a talk at #CoNLL2019 on captioning instructional vids + I'll be jetlagged and excited to grab coffee with folks :-)
0
0
27
@jmhessel
Jack Hessel
5 years
I love this blog post! TLDR most statistical tests are equivalent to linear regressions, and maybe we should start teaching them that way (instead of as separate tools)
@jonaslindeloev
Jonas K. Lindeløv
5 years
I've made this cheat sheet and I think it's important. Most stats 101 tests are simple linear models - including "non-parametric" tests. It's so simple we should only teach regression. Avoid confusing students with a zoo of named tests. 1/n
93
3K
9K
2
2
27
@jmhessel
Jack Hessel
1 year
TIL a fun transformer trick from t5x: packed examples. you can shove multiple (input, output) sequences into a single example's forward pass. The enc/dec attention masks are modified so that each packed example can only "see" itself. Esp. useful for TPU w/ a fixed seq len.
1
2
27
@jmhessel
Jack Hessel
5 years
Wahoo!! Our paper "A Case Study on Combining ASR and Visual Features for Generating Instructional Video Captions" was accepted to @conll2019 ! Work done last summer with my awesome intern hosts @GoogleAI , Bo Pang, Zhenhai Zhu, and Radu Soricut. Camera ready soon :-)
@conll2019
CoNLL 2019
5 years
Accepted paper list is now available on our webpage: ! Congrats again to all authors! Detailed program etc. will be up soon! looking forward 🙂
0
19
28
0
4
26
@jmhessel
Jack Hessel
5 years
Anyone else not understand the difference between "self-supervised" and "unsupervised"? Obviously there are fuzzy boundaries between different people's definitions of supervision, but are these not the same? What's unsupervised but not self-supervised? (or vice versa?)
5
4
25
@jmhessel
Jack Hessel
2 years
NYC MTA Modern Pretrained on weekends language models 🤝 Transferring for unknown reasons
1
3
25
@jmhessel
Jack Hessel
5 years
Vlad is one of my *favorite* speakers! Every time I go to a Vlad talk, I come away with a reading list, new knowledge, and, most importantly, fresh eyes/excitement about my own work. (If that wasn't enough --- his slides compete for the most aesthetically pleasing in all of NLP!)
1
3
26
@jmhessel
Jack Hessel
2 years
Excited some of our work is being presented at #NAACL2022 next week!! I will be attending mostly virtually, but because I'm in Seattle, would be excited for some outdoor meetups with folks! I'm considering remote working from a nearby park one day :-) DM if you want to hang!
0
0
26
@jmhessel
Jack Hessel
3 years
@raphaeljlt @opencitations @Blendenfleck R2: "the related work sec is incomplete, strong reject" 😜
1
0
26
@jmhessel
Jack Hessel
6 years
Last tweet on this topic! Another way of exploring the similarity between position embeddings is to simply plot a heatmap of all pairwise cosine similarities. Here's what comes of that for the sinusoid embeddings and the learned embeddings. So many weird things in the learned emb
Tweet media one
Tweet media two
1
3
26
@jmhessel
Jack Hessel
2 years
Really excited about this one! (And, as always, thanks to @ak92501 !)
@_akhaliq
AK
2 years
Quark: Controllable Text Generation with Reinforced Unlearning abs: introduce Quantized Reward Konditioning (Quark), an algorithm for optimizing a reward function that quantifies an (un)wanted property, while not straying too far from the original model
Tweet media one
0
24
163
0
4
26
@jmhessel
Jack Hessel
2 years
Tweet media one
4
0
24
@jmhessel
Jack Hessel
4 years
I can't be the only one with no intuition for this "self training" business right? So you label unlabeled instances with your model and then just add them to your training set? And that makes your model generalize better? Are there good explanations out there? #nlproc #lazyweb
7
2
24
@jmhessel
Jack Hessel
8 months
this sped up my qlora models by ~20% for approx 2 minutes of work and 1 LoC change :-)
@younesbelkada
younes
8 months
New feature alert in the @huggingface ecosystem! Flash Attention 2 natively supported in huggingface transformers, supports training PEFT, and quantization (GPTQ, QLoRA, LLM.int8) First pip install flash attention and pass use_flash_attention_2=True when loading the model!
Tweet media one
8
103
526
2
2
25