n0riskn0r3ward Profile Banner
search founder Profile
search founder

@n0riskn0r3ward

Followers
2K
Following
10K
Media
276
Statuses
4K

Solo entrepreneur passionate about AI and search tech. Building a niche search product and sharing what I learn along the way.

Joined June 2022
Don't wanna be here? Send us removal request.
@n0riskn0r3ward
search founder
2 days
Weird time to be alive - for some ideas it's literally easier for me to build a react app MVP and deploy it live with Vercel than it is for me to build a decent ppt deck pitching the idea. .
1
0
5
@n0riskn0r3ward
search founder
3 days
So what happens when these prompt injection attacks become part of the pre-training data? Does the model start prompt injecting itself in the middle of your agentic workflow? Does the prompt injection attack stop working?.
@scaling01
Lisan al Gaib
4 days
this is scientific seppuku
Tweet media one
4
0
6
@n0riskn0r3ward
search founder
6 days
This is o3's attempt to plot the graph from the paper so numbers are not exact I don't think but it's the same key takeaway - i.e. o3 is crushing again.
0
0
2
@n0riskn0r3ward
search founder
6 days
Fun new benchmark - instruction following is a key thing to measure for enterprise use cases - but you gotta post the money shot up front!. Also want to see all the usual players: Opus 4, R1-0528, o4-mini, gemini flash etc etc with cost shown too like aider!
Tweet media one
@allen_ai
Ai2
6 days
Introducing IFBench, a benchmark to measure how well AI models follow new, challenging, and diverse verifiable instructions. Top models like Gemini 2.5 Pro or Claude 4 Sonnet are only able to score up to 50%, presenting an open frontier for post-training. 🧵
Tweet media one
1
0
5
@n0riskn0r3ward
search founder
6 days
When I haven't used Claude at all over the course of a week, I'm paying $20 a month, and I go issue a new query and get - sorry we can't help you bc Claude is overwhelmed at the moment. It makes @AnthropicAI feel like an unserious company.
1
0
3
@n0riskn0r3ward
search founder
7 days
Soham could make a killing on an interview course right now:. How to make a fake resume, pass any tech interview, and cheat on everything - sponsored by Cluely. Guaranteed 💰🤑💸.
0
0
1
@n0riskn0r3ward
search founder
8 days
Pretty sure if I had a parrot it would randomly shout "databricks SUCKS!!" at this point bc that's mostly what I'm shouting at my computer trying to do my job lately and 10/10 of the engineers on my team agree.
1
0
0
@n0riskn0r3ward
search founder
8 days
If you just read the available features, databricks seems cool/useful. If you actually use databricks you realize none of it actually works and that makes it much less cool but maybe that's just me. We have reached steeply negative NPS score territory at this point.
4
0
6
@n0riskn0r3ward
search founder
11 days
RT @SinclairWang1: What Makes a Base Language Model Suitable for RL?. Rumors in the community say RL (i.e., RLVR) on LLMs is full of “myste….
0
89
0
@n0riskn0r3ward
search founder
12 days
Yes yes they'll be bought for a stupid amount of money in a few weeks by some big tech org I know how this works but I stand by my armchair critic take, perplexity is great except for the part where they don't have a business model.
0
0
0
@n0riskn0r3ward
search founder
12 days
"Specialty metal items" sounds like a better business idea than whatever perplexity is going to pivot to next lol.
@AnthropicAI
Anthropic
12 days
Anthropic staff realized they could ask Claude to buy things that weren’t just food & drink. After someone randomly decided to ask it to order a tungsten cube, Claude ended up with an inventory full of (as it put it) “specialty metal items” that it ended up selling at a loss.
Tweet media one
1
0
4
@n0riskn0r3ward
search founder
16 days
If you're doing graph RAG and have never even heard of (much less tried) pseudo relevance feedback you're ngmi.
0
0
3
@n0riskn0r3ward
search founder
16 days
Ok end of rant.
0
0
0
@n0riskn0r3ward
search founder
16 days
This is even better but too big:
Tweet media one
Tweet media two
1
0
1
@n0riskn0r3ward
search founder
16 days
They ruined it:
Tweet media one
1
0
0
@n0riskn0r3ward
search founder
16 days
This was perfect:
Tweet media one
Tweet media two
1
0
0
@n0riskn0r3ward
search founder
16 days
Random but y are new cars with features I like 4x more hideous than their normie versions. 2026 RAV4 Plug in Hybrid: great specs, very mid/bad looks.Latest Tesla Model Y: Finally decent build quality but 🤮 looks. Also everything as a subscription is gross.
2
0
1
@n0riskn0r3ward
search founder
17 days
Feels like one of the missing rewards for coding agents is - did the agent remove all the unused code. Even when I specifically instruct the agent to remove a specific function, it typically forgets to remove the imports for that function.
1
0
3
@n0riskn0r3ward
search founder
18 days
Given that many of the newer embedding models are approximately 0.1% better than prior versions. It feels like an LLM that can do the finding for you, is the next logical thing to put effort into training?.
0
0
0
@n0riskn0r3ward
search founder
18 days
Looking for papers exploring the extent to which RL training an LLM to call the tools YOU/YOUR biz wants it to call is necessary?. An LLM RL trained to perform BM25 search with a given tool description/tool call format will perform worse if you keep the same description/format
Tweet media one
3
0
8