
Jakub Macina
@dmacjam
Followers
217
Following
190
Media
15
Statuses
104
AI/ML Scientist, mountain biker
Zurich
Joined June 2013
AI alignment for tutoring๐ We use full online RL with conversation-level rewardsโnot just single-turn signals like DPO. Did the student actually learn by the end?.Using GRPO, the model learns real teaching strategies like when to hint or when to correct. Explore models belowโคต๏ธ.
This paper introduces an online reinforcement learning framework using simulated student-tutor interactions. It trains LLMs to prioritize guiding students pedagogically instead of simply revealing solutions, aligning models with better teaching methods. This helps students
1
3
12
๐ฅ Try it now!.Run MathTutorBench locally with your own models or submit them to our leaderboard. Open-source! ๐@ndaheim_ @idohakimi @ Manu Kapur @IGurevych @mrinmayasachan @ETH_AI_Center.
0
0
2
๐ ๐๐จ๐ฐ ๐ฐ๐๐ฅ๐ฅ ๐๐๐ง ๐๐๐๐ฌ ๐ญ๐๐๐๐ก?.Evaluating LLMs for education is key to making real progress, yet we lack a reliable and simple benchmark. Introducing ๐๐๐ญ๐ก๐๐ฎ๐ญ๐จ๐ซ๐๐๐ง๐๐กโan open-source benchmark designed to assess holistic tutoring capabilities in AI.
4
3
8
I've been recognized as an Outstanding Reviewer for #EMNLP2024 ! ๐๐Contributing to the community is always rewarding and every review is an opportunity to learn and grow.
We're kicking off the awards session at #EMNLP2024 by announcing our (many) **Outstanding Reviewers**!
0
0
13
Mistakes are key learning opportunities!๐งโ๐ Can LLMs help students learn from them through dialog? ๐ฌ While they often struggle to diagnose student errors when generating responses directly, adding a verification step โ
could make a difference. #EMNLP2024.
๐๐ฎ๐ป ๐๐๐ ๐ ๐ต๐ฒ๐น๐ฝ ๐๐๐๐ฑ๐ฒ๐ป๐๐ ๐น๐ฒ๐ฎ๐ฟ๐ป ๐ณ๐ฟ๐ผ๐บ ๐บ๐ถ๐๐๐ฎ๐ธ๐ฒ๐?.Models struggle to spot student errors, but a verification step could help. More below!. ๐งต(1/9) #EMNLP2024. ๐ฐ
0
3
18
Chat with Junling about our work of generating and evaluating the quality of multi-turn teacher-student conversations grounded in textbooks. #ACL2024 #ACL2024NLP
Excited to share that our paper, "Book2Dial: Generating Teacher-Student Interactions from Textbooks for Cost-Effective Development of Educational Chatbots," has been accepted at ACL 2024 Findings! . We will go to Bangkok to attend ACL 2024!.
1
0
7
Excited to be at #EMNLP2023 in Singapore to present our paper.
We are happy to announce the release of ๐งฎ MathDial, a dataset of one-to-one teacher-student tutoring dialogues grounded in multi-step math reasoning problems. Discover more about our latest #EMNLP2023 Findings paper in this ๐งต (1/8). #dialogue #NLProcessing #MathWordProblems
0
1
20
Consider joining fellowship programs at ETH AI Center.
โฐ Deadline Approaching! . ๐Apply by Nov 22 to become an ETH AI Center #PhD or #Postdoc! . We're hosting an EXTRA Online Q&A Session.๐
Monday, Nov 20, 16:15 - 17:00 CET. Zoom link on our website ๐ Don't miss this chance to get the info you need!
0
0
2
So many interesting ideas and discussions at #LaunchAIXSummit2023 @ETH_AI_Center.
Had a great time at the X+AI summit last week!.Many exciting and inspiring questions, looking forward to the next steps. Thanks @dmacjam for the invitation and organization!
0
0
9
RT @arkrause: Doctoral and Postdoc Fellowships at the @ETH_AI_Center! Applications accepted until November 22 2023. .
0
26
0
Interested in AI for Education and how LLMs to improve education? Come and join us at the #LaunchAIXSummit2023 workshop tomorrow (10:30)! .@ETH_AI_Center.
2
3
16
LLMs get better at reasoning, but can they act as a tutors? Turns out, they're quick to spill out the answers.๐ค Check out the new ๐งฎMathDial dataset, built with input from teachers and roleplaying students.
@Ember612 @thy2512 @IGurevych @i_sukannya @seb_ruder @PfeiffJo @GoogleDeepMind @licwu @CambridgeLTL @Cambridge_Uni @shan23chen @aim_harvard @BrighamWomens @Bos_CHIP @dbittermanmd @dmacjam @ETH_en @ETH_AI_Center @ndaheim_ @TanmaySinha655 @NTUsg . @IGurevych (@UKPLab) and @mrinmayasachan (@ETH_en) (6/๐งต) #EMNLP2023.
0
5
18
RT @wangchunshu: Tired of the output length limit of ChatGPT? Try RecurrentGPT, a language simulacra of the recurrence mechanism in RNNs. Yโฆ.
0
41
0
RT @ETH_AI_Center: Celebrating Jakub Macina's remarkable achievement as he secures a coveted spot on Forbes' 30 Under 30 list in Science anโฆ.
0
3
0