
Diptanu Choudhury
@diptanu
Followers
3K
Following
3K
Media
158
Statuses
8K
Founder @tensorlake. Past - AI and Distributed Systems at @meta, @hashicorp, @linkedin and @netflix
San Francisco, CA
Joined December 2007
Excited to announce @tensorlake Cloud! 🧵 Tensorlake converts real-world documents into clean, structured data for business workflow automation and for building Agents in mission-critical documents. It's powered by a state-of-the-art document layout understanding model trained
Announcing Tensorlake Cloud Up-leveling Document Ingestion and Workflows for building agentic applications and complex business workflows.
15
19
73
Huge W for humanity if this works out at scale. https://t.co/xwWNr0dLeN
gatesnotes.com
A company called Fervo Energy is hoping to supercharge geothermal power with an innovative new approach to turning the earth’s heat into power.
0
0
1
You can now login into Tensorlake using Microsoft and Azure SSO credentials! This is the beginning of better integration with Microsoft Azure and Tensorlake. If you are using Azure, and need better Document Ingestion and ETL for unstructured data reach out to us!
0
2
2
A little secret that everyone building server-less systems know but I don't see being shared often - you have to move away from Docker’s layer-based image format, which relies on stacking tarball layers using overlay-fs, to a file/block based indexing approach. This makes it
6
17
175
if you feel like you're not running enough experiments, the bottleneck is almost always infrastructure, not ideas. focus on improving your infrastructure: write parallelized code: many teams are still doing all their tests using for loops. spending 1-2 hours learning to write
3
3
24
Hard disagree. A/B Tests are cheap and often provide good top line indicators if a new feature improves a top-line business metric. Evals are great and products that use AI shouldn't be shipped before running evals, but there is a place for A/B tests to continuously run in
This is the exact opposite conclusion to draw from the acquisition. AI enables you to build much more dynamic products that evolve faster (and even automatically). The foundation underlying that is good evals. A/B tests are officially the way of the past.
1
0
4
Really impressed with the simplicity of @trychroma's write ahead log implementation on S3. I hope future cluster schedulers build on top of this to store their state! Most schedulers(Mesos, Nomad, K8s) has been built on top of replicated state machines with state on SSDs and
1
1
15
@diptanu @tensorlake Love this direction. Structured extraction is powerful but without traceability it’s hard to trust. Bounding-box citations close that gap and this will be well received.
1
1
2
@diptanu @tensorlake Much-needed feature, Diptanu. Just tested this on a few complex docs, and it works perfectly. Thanks for releasing this. Really helpful.
1
1
4
How do we trust data from AI Workflows? - We have heard this time and again from developers working in finance and healthcare. Today, we’re releasing citations for Structured Output in @tensorlake Now you can build verifiable AI workflows. Every field extracted has page number
12
10
57
This is a really excellent team (and has some SlateDB contributors on it!).
We are hiring a backend engineer at @tensorlake. If you love Rust, and want to work in the intersection of data and distributed systems, this role is for you! DM or email me. More information here -
0
1
6
We need a distributed systems pod with builders. There is a platform shift happening in infrastructure. Early stage companies building on server-less more than ever, databases going from SSD to S3, pretty much every Neo/AI cloud building a container runtime for speeding up
10
6
109
We are hiring a backend engineer at @tensorlake. If you love Rust, and want to work in the intersection of data and distributed systems, this role is for you! DM or email me. More information here -
linkedin.com
Posted 9:53:34 PM. Tensorlake is building a distributed data processing platform for developers building Generative AI…See this and similar jobs on LinkedIn.
3
1
10
Coding LLMs seems to write simpler code the more you chat with them about requirements. The first version is always more complex it needs to be or too simple to solve the problem. Experienced developers should be able to understand a solution is too complex and guide the LLM to
0
0
2
We are bumping into a problem where @tensorlake's OCR models perform better than ground truth of OCR Bench v2 benchmark. On some documents we are getting penalized for getting accurate answers 😅 Ex - Ground Truth is missing the subscripts next to "VASP"
0
1
4
This is really cool. Research like this is going to enable parametric memory some day. In the short term external memory retrieval is the way to go. Worth keeping an eye on this line of research.
🔍 How do we teach an LLM to 𝘮𝘢𝘴𝘵𝘦𝘳 a body of knowledge? In new work with @AIatMeta, we propose Active Reading 📙: a way for models to teach themselves new things by self-studying their training data. Results: * 𝟔𝟔% on SimpleQA w/ an 8B model by studying the wikipedia
0
0
3
I wish they had good documentation and tools to go with the hardware.
My Google colleagues Norm Jouppi & Sridhar Lakshmanamurthy gave a talk today at Hot Chips on TPUv7 ("Ironwood"). The TPUv7 system offers 9216 chips / pod (42.5 exaflops of fp8), but we can scale across many of these pods to provide multiple zettaflops.
0
0
0
How does SEA have a Qdoba. Is this the only place in west coast where they have a store?
0
0
1
Like this idea. Once you ship an MVP Agentic application: watch how users actually use it → refactor into deterministic workflows that just work. Most teams are focusing of pre-production benchmarks on synthetic data. We will see more discussions around in-production tracking
the ai monitoring trap: llm judges look right but deliver garbage insights when your ai product fails, there's no exception thrown. users just silently leave. traditional monitoring tools like sentry work because they track actual errors. but for ai products? your users say
0
0
2
Pretty much how I feel when I try to make it write a non trivial feature in a state machine.
@mitchellh LLMs also write a s***-ton of code for no reason, often repeating large portions of code, and rewriting entire functions that you already have, except with minor and subtle differences.
0
0
1