Michael Skarlinski
@m_skarlinski
Followers
553
Following
136
Media
7
Statuses
64
Head of Platform @ Edison Scientific
San Francisco
Joined May 2024
I think people are mostly still just starting to play with Kosmos and understand what it can do, but the response so far has been significantly beyond what we expected. Excerpts from a great write up by Zachary Flamholz: “It is an understatement to say I was impressed with what
2
2
33
Try it out for yourself on our platform: https://t.co/N2a1pva58W
platform.edisonscientific.com
AI Agents for Scientific Discovery
0
2
5
Kosmos is unlike any other agent we have at Edison, both in terms of outputs and infrastructure. Running at scale requires our platform to support order-of-magnitude swings in resource requirements, all unknown at submit time. Each run sees between 0 and 120 sandbox
2
18
58
Our older agents, like Crow, Phoenix, and HasAnyone, are still available on our platform as Literature, Molecules, and Precedent, respectively, for 1-2 credits per run. We will be launching more powerful versions soon! (Falcon, our deep research agent, has merged with Crow.)
1
10
29
After two years of work, we’ve made an AI Scientist that runs for days and makes genuine discoveries. Working with external collaborators, we report seven externally validated discoveries across multiple fields. It is available right now for anyone to use. 1/5
111
549
4K
We’ve been unusually quiet for ~5 months because we didn’t want to just announce something and not let people use it. So we built it at scale (thanks @m_skarlinski and @ludomitch and eng🫠). And are letting edu users try it a bunch. 2/5
3
6
196
More on Kosmos from some of the team behind it here. And check out the technical report:
arxiv.org
Data-driven scientific discovery requires iterative cycles of literature search, hypothesis generation, and data analysis. Substantial progress has been made towards AI agents that can automate...
Kosmos, our newest AI Scientist, is available to use today on our platform. Watch here as three of our scientists describe what Kosmos is, and how it can accelerate scientific research.
2
2
26
Try Kosmos on our new platform, here: https://t.co/PHYFaC5idK Read our technical report: https://t.co/20AcIFWAZl Read more about Kosmos on our blog: https://t.co/qCvlEwxrZi Finally, read more about Edison Scientific here: https://t.co/iSQIpPP7N4 We can’t wait to see how you
edisonscientific.com
Today we are launching Edison Scientific, a new commercial spinout that will focus on further developing and deploying our AI Scientist for commercial applications.
6
38
236
Today, we’re announcing Kosmos, our newest AI Scientist, available to use now. Users estimate Kosmos does 6 months of work in a single day. One run can read 1,500 papers and write 42,000 lines of code. At least 79% of its findings are reproducible. Kosmos has made 7 discoveries
195
649
4K
More of my thoughts here: https://t.co/mCZwLtRdiN Most vector DBs support "hybrid" retrieval with both sparse and dense indices. These implementations seem to be wholly separate indices, akin to separate systems which are merged. And, in a hosted setting, you still pay the price
1
0
2
- For semantic search, LLMs usually nail query expansions and synonyms, adding specificity where necessary. - DeepMind's recent "On the Theoretical Limitations of Embedding-Based Retrieval" paper shows that dense embeddings will fail to capture context in many scenarios where
1
0
3
Am I missing the boat on vector DBs? From web and social mentions, Pinecone is up 97% YoY and Milvus is up 50% YoY. Dense embedding indices are great in non-text settings, but advantages over sparse indices in text-heavy RAG applications aren't always obvious to me (1/3).
4
3
14
Reach out if you'd like to join our amazing platform team!!
We are looking to hire an outstanding UI/UX designer with strong front-end engineering skills to reimagine how researchers can make discoveries in collaboration with AI. If you have these skills and want to help AI accelerate science, get in touch.
0
1
2
This is bad for AI measurement. As other AI benchmarks have become saturated, model makers have turned to Humanity’s Last Exam as a good measure of AI ability. Except a careful review suggests many of the exam questions have incorrect “right” answers. Benchmarking is hard.
HLE has recently become the benchmark to beat for frontier agents. We @FutureHouseSF took a closer look at the chem and bio questions and found about 30% of them are likely invalid based on our analysis and third-party PhD evaluations. 1/7
6
24
186
HLE has recently become the benchmark to beat for frontier agents. We @FutureHouseSF took a closer look at the chem and bio questions and found about 30% of them are likely invalid based on our analysis and third-party PhD evaluations. 1/7
19
89
605
FutureHouse aims to build an ‘AI scientist’ that can command the entire research pipeline, from hypothesis generation to paper production https://t.co/CvReGqVOUK
nature.com
Nature - The model, called ether0, outperforms other advanced AIs at chemistry tasks and is a stepping stone towards automating the entire research pipeline.
6
35
150
Today we are releasing ether0, our first scientific reasoning model. We trained Mistral 24B with RL on several molecular design tasks in chemistry. Remarkably, we found that LLMs can learn some scientific tasks more much data-efficiently than specialized models trained from
33
226
1K
At FutureHouse, we’ve noticed scientific agents are good at applying average intelligence across tasks. They always seem to make the obvious choices, which is good, but discovery sometimes requires more intuition and insight than average. We’ve made the first step today towards
15
81
413
The FutureHouse platform now has a public documentation repository for raising issues and sharing demos. Please open any issues you find with our API client here! (link in reply)
2
4
18