
Kyle Lo
@kylelostat
Followers
3K
Following
3K
Media
51
Statuses
647
#nlproc #hci research scientist @allen_ai, co-lead of data for OLMo w/ @soldni, he/him, find me on 👉🏻https://t.co/5Hm9cx3Urz🧋
Seattle, WA
Joined January 2019
"Out of 13,048 reviewers. only 69 were deemed highly irresponsible. and enforcement was applied solely in those cases. These reviewers were contacted multiple times. as well as being personally contacted by the area chairs and senior area chairs, but still failed to fulfill them".
This year, EMNLP desk rejected approximately 100 papers. For more insight into the process, and potential future changes, please see this blog post from the PCs: @c_christodoulop @Tanmoy_Chak @VioletNPeng.
0
4
33
my favorite figure from work by @heinemandavidj . if you're frustrated by LM evals, not knowing if results are real or noise, it's useful to decompose sources of variance:. 🐠is there enough spread between compared models (signal).🐟do scores vary among intermediate ckpts (noise).
(2/6) Consider these training curves: 150M, 300M and 1B param models on 25 pretraining corpora. Many benchmarks can separate models, but are too noisy, and vice versa! 😧. We want – ⭐ low noise and high signal ⭐ – *both* low variance during training and a high spread of scores.
0
2
26
thx to all the feedback from OSS community! . our olmOCR lead @jakepoznanski shipped a new model fixing lotta issues + some more optimization for better throughput. have fun converting PDFs!.
📝 olmOCR v0.2.1 has arrived with new models! Our open‑source OCR engine now reads tougher docs with greater precision—and it’s still 100 % open. 👇
1
3
10
RT @cmalaviya11: People at #ACL2025, come drop by our poster today & chat with me about how context matters for reliable language model eva….
0
6
0
RT @tongshuangwu: We all agree that AI models/agents should augment humans instead of replace us in many cases. But how do we pick when to….
0
21
0
issues w preference LM benchmarks.🐡data contains cases where the "bad" response is just as good as chosen one.🐟model rankings can feel off (claude ranks lower than expected). led by @cmalaviya11 (TACL 2025), we study underspecified queries & detrimental effect on model evals.
In our new paper, “Contextualized Evaluations: Judging Language Model Responses to Underspecified Queries,” we find that adding just a bit of missing context can reorder model leaderboards—and surface hidden biases. 🧵👇
0
10
31
presenting olmOCR at the poster session (2:15pm 211 West) for #codeml workshop at #icml2025! .🐟 fully open source OCR, comparable or better than frontier VLMs.🐠 all weights, data, code free & public.🐡 new benchmark of OCR "unit tests" on diverse PDFs & challenging OCR cases.
New updates for olmOCR, our fully open toolkit for transforming documents (PDFs & images) into clean markdown. We released:. 1️⃣ New benchmark for fair comparison of OCR engines and APIs.2️⃣ Improved inference that is faster and cheaper to run.3️⃣ Docker image for easy deployment
0
4
30
RT @_awettig: Presenting two posters at ICML over the next two days:.- Both at 11am - 1:30pm.- Both about how to improve pre-training with….
0
9
0
will be at #icml2025, lemme kno if wanna chat about OLMo pretraining data curation, evaluation, data mixing, etc!👋. find us at poster sess on 📅Wed 7/16 @ 11am⏲️ to learn about Web Organizer, distilling web data taxonomies into small models & using them for LM data mixing!.
🤔 Ever wondered how prevalent some type of web content is during LM pre-training?. In our new paper, we propose WebOrganizer which *constructs domains* based on the topic and format of CommonCrawl web pages 🌐. Key takeaway: domains help us curate better pre-training data! 🧵/N
0
7
33
we developed the benchmark independently so no dev/test leakage, and even so, results show olmOCR produces often higher quality output than even proprietary OCR tools & is way cheaper + local as well!. our team will be at #ICML2025, come find me, @jakepoznanski and @soldni there.
8
1
4
excited to release our new benchmark for OCR addressing 3 eval challenges:.🐟 coverage of many types of docs (born digital vs old scans, pages w tiny fonts, etc).🐡 coverage of many different OCR targets (e.g. equations, tables, etc).🐠 apples-to-apples comparison across systems.
New updates for olmOCR, our fully open toolkit for transforming documents (PDFs & images) into clean markdown. We released:. 1️⃣ New benchmark for fair comparison of OCR engines and APIs.2️⃣ Improved inference that is faster and cheaper to run.3️⃣ Docker image for easy deployment
2
6
36
excited to win 🏆 this award for our work on molmo & pixmo, showing the value of high-quality data curation for VLMs! . recalling when we released same time as Llama 3.2 😆. huge kudos to @mattdeitke chris clark & @anikembhavi for their leadership on this project!
Molmo won the Best Paper Honorable Mention award @CVPR!. This work was a long journey over 1.5 years, from failing to get strong performance with massive scale, low quality data, to focusing on modest scale extremely high quality data! Proud to see what it became. #CVPR2025
2
5
47
RT @tyleraromero: Thrilled to announce I've joined the incredible team at @allen_ai! I'll be working on language modeling!.
0
4
0
RT @finbarrtimbers: excited to announce that I’ve joined the Allen institute, where I’ll be working on RL for LLMs.
0
13
0
great work from philippe as always☺️ agree w view reliability is absolutely key.
🆕paper: LLMs Get Lost in Multi-Turn Conversation. In real life, people don’t speak in perfect prompts. So we simulate multi-turn conversations — less lab-like, more like real use. We find that LLMs get lost in conversation. 👀What does that mean? 🧵1/N.📄
1
7
9
we released OLMo 2 1B, showing again how well our OLMo 2 pretrain & post train recipe works!. Our small 1B model is comparable or better than other top open weights-only alternatives while maintaining full open data, code & intermediate checkpoints!.
We're excited to round out the OLMo 2 family with its smallest member, OLMo 2 1B, surpassing peer models like Gemma 3 1B or Llama 3.2 1B. The 1B model should enable rapid iteration for researchers, more local development, and a more complete picture of how our recipe scales.
0
7
49
outstanding paper award for our AI in Education work!. 🐟 dataset of natural images of student solutions to K-12 math problems from online teaching platform.🐠 annotations (dense captions, VQA pairs) by teachers to eval VLMs. chat w leads @samibaral144 @lucy3_li at #NAACL2025 🤩.
🟢 Announcing the #NAACL2025 Award Winners! . The Best Paper and Best Theme Paper winners will present at our closing session.
4
8
40