
Ehud Reiter
@EhudReiter
Followers
2K
Following
2K
Media
23
Statuses
2K
I am a computer scientist who works on natural language generation and evaluation, often in healthcare contexts. I teach at Aberdeen University.
Aberdeen, Scotland
Joined May 2014
đź’Š Not a very good news for Medical LLMs. A new Mass General Brigham study shows leading LLMs often try to please the user in medical chats, and to do that, can output wrong advice. Paper shows that default models will confidently echo bad medical assumptions, and that a small
18
73
305
One of my main goals for 2025-26 is to help my 6 senior PhD students submit their PhDs before I retire. Glad to say that Nicolay Babakov has now done so, with viva scheduled for Dec. Other five students seem to be on track, which is encouraging.
0
0
16
More best paper reviewing, and once again I am disappointed to see that authors do *not* keep promises they make (in response to reviewers) to add XX to camera-ready. I guess promises are cheap...
0
0
1
Somewhat frustrated yesterday to once again read ACL paper which did all sorts of complex things (including the usual results tables showing best approach) on garbage data. With minimal ack of this in limitations. Most fundamental rule of CS is Garbage In, Garbage Out
1
0
4
Interesting article in @TheEconomist about real-world usage of robotaxis in SF. Final sentence "If you have the time, by all means sit in a robotaxi. If you need to get somewhere fast, there is nothing better than a [human-driven] yellow cab."
0
0
3
New blog: Good diagrams for research papers Ive seen a number of diagrams recently which are too complicated and difficult to understand. I explain some of the problems I see and give advice. https://t.co/4Lp5UWU06g
ehudreiter.com
Ive seen a number of diagrams recently which are too complicated and difficult to understand. I explain some of the problems I see and give advice.
0
2
9
Really interesting paper on real-world evaluation in IR. I should learn more about eval in IR, its not something Ive ever properly looked at https://t.co/e36KrBKeeD
dl.acm.org
0
0
1
Several people have asked me recently if I will still be able to contribute to research projects after I retire in summer 2026. Absolutely! I will have emeritus statius, and am very hapy to remain involved in research projects at Aberdeen amd elsewhere.
0
0
8
Aberdeen CS is hiring! We are especially interested in hiring new faculty in NLP. Closing date is 8 Oct. For more info, see below https://t.co/YP7ckDGG3k
0
2
1
New blog: Reflections on blogging I am often asked about my experience blogging, sometimes by people who are considering writing their own blog. In this “meta” blog, I summarise my thoughts and experiences about my blog. https://t.co/SyEp10IcWh
ehudreiter.com
I am often asked about my experience blogging, sometimes by people who are considering writing their own blog. In this “meta” blog, I summarise my thoughts and experiences about my blog…
1
0
3
Just turned 65 today, feels like a milestone. Definitely feel old now...
6
0
14
Aberdeen CS will probably be looking for a new lecturer in NLP. Formal advert is not out yet, but feel free to contact me informally if interested.
0
0
1
The registration page for #INLG2025 is now live! Join us in Vietnam at the Oct 29 - Nov 2 for the best conference on #NaturalLanguageGeneration
https://t.co/0Q4XkUW3WN Curious to see what will be presented? Check out this list of accepted papers! https://t.co/FQkfRf8frZ
0
4
11
We spent the last year evaluating agents for HAL. My biggest learning: We live in the Windows 95 era of agent evaluation.
6
48
362
Really interesting paper showing that typos (etc) degrade LLM accuracy in medical contexts. I suspect most LLM benchmarks do not include typos... https://t.co/0mPhg0B6H7
dl.acm.org
0
2
2
Chat last week about commercialising one of our AI/health projects. I realised that I have been involved with 3 Health IT startups, and all have failed. Maybe this is typical? I dont know the stats, but Im sure failure is pretty common
0
0
0
Interesting chat with @JatinGanhotra about SWEBench-Verified benchmark, which is OpenAI's version of SWEBench. Verified includes many improvements, but its also much easier than original, which many people dont realise. Hum, maybe need to be wary of BM created by LLM vendors...
1
1
2
New blog: Defining hallucination is not straightforward Many researchers assume that hallucination is a binary feature; either something is a hallucination or it is not. This is too simplistic. I describe some of the issues I have seen below. https://t.co/DtdwuMgo4E
ehudreiter.com
Most academic work assumes that hallucination is a binary feature: either something is a hallucination or it is not a hallucination. But this is too simplistic. In real-world contexts we see many s…
0
3
12
Interesting chat with visitor about how UG CS curriculum should change because of AI coding assistants. Didnt agree everywhere, but did agree should be more focus in non-coding tasks (requirements, arch/design, testing), and more emphasis on data qual issues when teaching ML
0
0
3
At ACL, I engaged with 50 papers (went to oral, talked to poster person). Decided (looked at paper sometimes), that 3 of these robust, interesting, relevant to me; 2 of these 3 won awards. Hum, maybe in future I should focus on 40 award papers, ignore the other 3000?
0
0
4