Alexander Koller
@alkoller
Followers
1K
Following
336
Media
4
Statuses
191
Professor, musician, speaker of @neuroexplicit. This account is inactive - please find me at https://t.co/hizq2axlKP
Joined March 2009
I am following my university in leaving Twitter. I would be very pleased if you chose to reconnect with me at https://t.co/A6x29Ulsxs See you there! https://t.co/fojL1RL8uT
researchprofessionalnews.com
Changes to the social media platform make further use “untenable”, group says
0
0
1
AutoPlanBench 2.0 now evaluates LLMs as planners on more than 50 domains. ReAct (with GPT-4o) is often worse, but sometimes better than symbolic planners. https://t.co/gvOSkG5yIX
#nlproc
1
0
3
Come do research with me, my fantastic colleagues, and some of the coolest PhD students I've ever met! #NLProc #nesy #neurosymbolic #AI #ML
1
2
10
spending a great week in Saarland for the @neuroexplicit retreat of @SIC_Saar. great talks, posters and interactions with #PhD students on #neurosymbolic #nesy #AI #ML (and boardgames!) Thanks @alkoller @slusallek @IValeraM for having me!
0
3
19
So proud of my postdoc Mareike Hartmann for this excellent work.
🏆 ACL Best Resource Paper Award: AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents by Trivedi et al. #NLProc #ACL2024NLP
1
1
17
It was fun to apply #NLProc methods to software engineering with my brilliant colleague @AndreasZeller and his student @TurikMammadov. The coolest part, to me, is that you can backtranslate program outputs into program inputs. Let's see where this will go!
Learning models from programs! Given a program P, our MODELIZER learns a model M that mocks P's behavior, producing P's output for a given input. But M is also reversible, predicting inputs for which P produces a given output, with up to 95.4% accuracy: https://t.co/5YKbUkccpD 🧵
0
1
6
Can you use LLMs to replace crowdworkers in NLP evaluations? My amazing collaborators and I analyzed this broadly. Answer: Sometimes LLMs correlate very well with human judgments, but you can't rely on it.
1/5 📣 Excited to share “LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks”! https://t.co/BIqZCmToz1 🚀 We introduce JUDGE-BENCH, a benchmark to investigate to what extent LLM-generated judgements align with human evaluations. #NLProc
1
5
30
Ellie is fantastic, and I am so delighted to have her here.
Ellie Pavlick @Brown_NLP is visiting us for three months as a @dfg_public Mercator Fellow. We are thrilled to have her with us and look forward to many fruitful research collaborations.
0
0
3
This is a decent summary of the octopus thought experiment from Bender & @alkoller 2020, with two glaring exceptions, right at the start: https://t.co/zvj3w26jVr >>
techcrunch.com
What is AI? We've put together this non-technical guide to give anyone a fighting chance to understand how and why today's AI works.
2
21
85
ChatGPT getting information from Bild - what could possibly go wrong?
We have formed a new global partnership with @AxelSpringer and its news products. Real-time information from @politico, @BusinessInsider, European properties @BILD and @welt, and other publications will soon be available to ChatGPT users. ChatGPT’s answers to user queries will
0
0
5
Can LLMs do planning? My PhD student @Stein1Katharina built AutoPlanBench, which can automatically convert any PDDL benchmark domain into a benchmark for LLM planners, and they are not doing so hot. https://t.co/gvOSkG50Tp
#NLProc
1
1
7
My student @yuekun_yao did something really cool: Predict accuracy of a seq2seq model on test data from only the inputs. Core is a discriminator that learns to check whether the model's prediction is correct. Excellent accuracy across datasets. https://t.co/MgiEvqaNWr
#NLProc
1
2
11
A very fun piece of work that I got to collaborate on at @allen_ai: Hierarchical plans improve LLM planning on domains that have hierarchical structure.
How can LLM-agents dynamically adapt to task complexity & LLM capabilities "as-needed"? 🚨Introducing ADaPT to recursively intervene/decompose if task is too complex -> yields substantial gains on interactive tasks https://t.co/7f3zag1z29 w/ @ai2_aristo @allen_ai @uncnlp 🧵⬇️
1
1
8
Nucleus and top-k sampling are ubiquitous, but why do they work? @johnhewtt, @alkoller, @swabhz, @Ashish_S_AI and I explain the theory and give a new method to address model errors at their source (the softmax bottleneck)! 📄 https://t.co/0zRu3x9mVg 🧑💻 https://t.co/A57bEb4aqb
3
26
162
Come work with me: Three-year postdoc position, very suitable for developing your own research agenda and collaborations. Let's figure out reliable reasoning with LLMs and personalization of text and dialogue. Neurosymbolic models welcome.
0
18
44
Come work with us on neurosymbolic models of #NLProc! The first three students are amazing - be part of a wonderful team that investigates the design principles of combining neural and symbolic models. #ACL2023 #ACL2023NLP
0
4
11