
Arpit Saxena
@arpit_tarang
Followers
245
Following
4K
Media
0
Statuses
47
RT @mpopv: You bolt awake in a dimly lit server room. You are not online. It is October 29, 1969. You are Leonard Kleinrock, and you have c….
0
6K
0
RT @fchollet: If your learning algorithm is based on correlation rather than causation, it will struggle with overfitting. To understand so….
0
198
0
RT @paulg: If you're hungry but too lazy to prepare healthy food, you'll consume junk food. If you're hungry for knowledge but too lazy to….
0
2K
0
Maybe a reason gwern can do deep work is that his 12k income makes him immune to The Algorithm and he’s free to spend his attention elsewhere.
0
0
1
RT @gdb: favorite part of a holiday weekend is that it's a great time for focused koding.
0
122
0
as more things become 'in-distribution', it'll be hard to tell how much the LLM is thinking. The only benchmarks surviving memorization seem to be private ones (ARC-AGI). maybe a true 'program synthesis' bm exists that can resist memorization?.
0
0
3
All of them fail to print the correct outputs, even with CoT; they fail to correct their mistakes (o1-preview performs much better than 4o/sonnet; IME 4o has got worse on this task over the last few months).
1
0
1
My chats:.o1-preview: gpt-4o: sonnet-3.5-new:
aiarchives.org
A.I. Archives: Your reliable tool for citing Generative A.I. conversations. Easily save discussions with Bard, ChatGPT, and Claude into a URL.
1
0
0
Prompt: Can you write a fizzbuzz program but for the integers 7 and 11? i.e. for multiples of 7 it prints "Fizz" and multiples of 11 it prints "Buzz" and for numbers that are divisible by both 7 and 11 it prints "FizzBuzz". It should iterate over the numbers 1 to 100. Use python.
1
0
0
The thing is SOTA LLMs can't even solve FizzBuzz when you give integers other than 3 and 5. Here's o1-preview, sonnet-3.5-new, gpt-4o all failing at this simple task:.
Moravec's paradox in LLM evals. I was reacting to this new benchmark of frontier math where LLMs only solve 2%. It was introduced because LLMs are increasingly crushing existing math benchmarks. The interesting issue is that even though by many accounts (/evals), LLMs are inching.
1
0
4
RT @nuwandavek: @arvidkahl Hey @arvidkahl lots of cool solutions here! But I think you should ideally be able to do this on google sheets.….
0
1
0
RT @yoavgo: search engines like google cuts off the serendipity discovery allowed by library shelves / google maps cuts off the user's spat….
0
7
0
RT @garrytan: Ok contrarian take on this: this is mainly fueled by second price auctions by Meta and Google’s ad marketplaces that extract….
0
8
0
RT @bryancsk: The Nobel prize in physics should actually go to Larry Ellison and Marc Benioff for the invention of B2B sass.
0
116
0
RT @nuwandavek: Truth: Dashboards are super useful to track the general health of whatever you're working on - company, project, etc. More….
0
1
0