NeuML
@neumll
Followers
709
Following
792
Media
376
Statuses
959
NeuML is the company behind txtai, one of the most popular open-source AI frameworks in the world. 🗓️ https://t.co/g4o2yL30qa
Washington DC Metro 🇺🇸
Joined April 2020
Did you know that TxtAI applications can be spun up from a YAML file? Check out this example that builds a RAG Pipeline with Docling + GPT OSS for any document. https://t.co/jdrYzpo2VN
0
1
2
Language detection is an important task especially for routing requests to language specific models. This is easy with the staticvectors library. https://t.co/4d1ZwIo1MG
0
1
5
⚡ The Textractor pipeline is one of the most powerful pipelines in the TxtAI toolbox! This example converts documents to Markdown then splits by Markdown sections. A simple yet effective chunking strategy! https://t.co/3h1hw5mIrN
0
2
2
LLMs for text classification = 🤮 Encoder only models are a much better tool for the job! For resource-constrained devices, you should check out the BERT Hash series of models. You might be able to even get away with sub 1M params. Training code: https://t.co/jeakQa3Sm0
0
4
21
With TxtAI, in less than 10 lines of code you can extract, semantically chunk and vector index a webpage. This example shows how the data can be stored as a llama.cpp GGUF file. Pay attention to this, it's bigger than it appears... https://t.co/d9cBxazYxy
0
4
4
RAG is one of the most popular TxtAI use cases. Click to learn more. https://t.co/PPkP1puyZD
medium.com
Get up and running fast with this easy-to-use application
0
1
4
🎉 We're excited to release txtai 9.1! 9.1 introduces vector "un-databases" - store vectors with NumPy, Torch and even GGUF from llama.cpp! Let's keep it simple when we can. Release Notes: https://t.co/ZKHLwqzJrJ
https://t.co/t6KZHx45Ye
github.com
💡 All-in-one open-source AI framework for semantic search, LLM orchestration and language model workflows - neuml/txtai
0
2
6
🚀 Did you know that TxtAI RAG and Agent apps can be hosted as a standard OpenAI API service? Submit a prompt and this becomes much smarter than a vanilla LLM inference call! https://t.co/KCj42wp1sv
0
1
3
💥 Ever think about storing your vector database as a GGUF file? With support for all the fancy quantization methods, device backends and other great things only LLMs are lucky to get right now? Coming soon with TxtAI 9.1! https://t.co/GNxJNY0fOD
1
1
5
Want more control over your vector database? Then check out this article on using txtai's low-level APIs. https://t.co/lkFog22cl0
0
1
3
Cool to see that our PubMedBERT Embeddings model has been cited in over 45 medical/scientific/academic articles! Model: https://t.co/4xD39BPRSE Search: https://t.co/zymfaQmOJh
0
1
3
🤔 LLMs think in tensors and tokens not text. RAG requires prompts and text. REFRAG proposed reducing RAG tokenization. What if we add frozen knowledge vector layers to our LLMs for RAG? Interesting idea. TxtAI now supports directly building Torch knowledge vectors.
1
1
7
🚀 Why let LLMs have all the fun? It's time to run our vector databases like a LLM! An exciting change is coming in TxtAI 9.1. Vector databases fully on the GPU! FP4/NF4/INT8 quantization support and efficient on-GPU matrix multiplication. Link: https://t.co/3agsoJi1ns
0
1
4
💡 AI Workflows? Did you know that TxtAI had workflows years before many of the popular projects even existed? https://t.co/2fAVht0uk8
medium.com
A guide on when to use small and large language models
0
2
2
🗎 Want the background on the BERT Hash Nano models? Then check out this article for more! https://t.co/ttOuV0xd1B
medium.com
Learn how a simple tweak can drastically reduce model sizes
0
2
8
✨ We're proud to release the ColBERT Nano series of models. All 3 of these models come in at less than 1 million parameters (250K, 450K, 950K)! Late interaction models perform shockingly well with small models. Collection: https://t.co/gSVLMUrWcf Model: https://t.co/wUDXXDFRv7
2
30
240
🔥 240K parameters is all you need? Not quite but don't sleep on micromodels! This is a BERT model trained just like the original. The only difference is it's 240K parameters vs 110M. https://t.co/Ec8r73WfMr
huggingface.co
1
1
5
🚀 Excited to release a new set of models: The BERT Hash Nano series! Forget millions and billions of parameters, how about thousands? Think a 250K parameter model is useless? Think again. https://t.co/x6gqoi68TP
huggingface.co
0
2
15