
Evidently AI
@EvidentlyAI
Followers
2K
Following
1K
Media
510
Statuses
2K
Open source ML and LLM evaluation π , testing π¦and monitoring π GitHub: https://t.co/37H9bfnYj6 Discord: https://t.co/ElZ9RlroUa
Joined February 2020
3οΈβ£ 2οΈβ£ 1οΈβ£ Our free course on LLM evaluations for AI product teams starts today! π₯ 7 days of byte-sized videos into your inbox βοΈ Certificate upon completion π©βπ» No coding skills required π©βπ500+ students have signed up You can still join the courseπ https://t.co/Go2bNYJXCR
1
1
6
For more patterns in Gen AI applications, read the blog: https://t.co/WfL237WONA Or explore our database of 650 ML and LLM case studies from over 100 companies: https://t.co/Ipe0L1OfB8 5/5 π§΅
0
0
0
3οΈβ£ RAG is one of the most popular newcomer use cases. We highlighted RAG as a separate category, with customer support being the most common application. For example, DoorDash created a RAG-based delivery support chatbot: https://t.co/5QnujPusXq 4/5
1
0
0
2οΈβ£ RecSys and search are reimagined with GenAI. Search and RecSys are still a core theme, with LLMs adding even better semantic understanding and quality of results. For example, Netflix created a foundation model for personalized recommendations: https://t.co/Uo2gfmKPBw 3/5
1
0
0
1οΈβ£ Automation is still king. As with ML, companies pay great attention to optimizing and automating high-volume workflows. Gen AI helps achieve that for more complex flows. For example, Intuit uses GenAI to improve knowledge discovery: https://t.co/gSEO6dnS5X 2/5
1
0
0
π‘ Gen AI use cases in 2025: learnings from 650 examples. We highlighted some new patterns of how top companies apply Gen AI based on a database of AI and ML use cases weβve been curating: https://t.co/WfL237WONA Here are 3 of them π 1/5 π§΅
1
1
1
How to control character traits in LLMs? Anthropicβs research identifies patterns that control AI character, allowing to: π¦Ή Monitor its personality changes β
Mitigate undesirable personality shifts π‘ Identify training data leading to these shifts https://t.co/VtBdGLpyTQ
anthropic.com
A paper from Anthropic describing persona vectors and their applications to monitoring and controlling model behavior
0
0
2
π In case you missed it Synthetic data generator in Evidently open-source! A tool that helps you create custom test datasets: Specify what you want to generate β‘οΈ Define user profiles & roles β‘οΈ Pick LLMs β‘οΈ Run the generator. Try it out π https://t.co/b2AnvH1MDz
0
1
4
A Friday ML use case π π From the database of 500 ML & LLM systems: https://t.co/jJoUj6MfFZ How Nextdoor, the neighborhood network app, uses LLMs to generate engaging email subject lines to boost email opens, clicks, and subsequent platform sessions. https://t.co/54oExGEsE6
engblog.nextdoor.com
Generative AI (Gen AI) has demonstrated proficiency in content generation but does not consistently guarantee user engagement, mainly forβ¦
0
0
3
Practical tips on LLM evaluation π§ Booking shares its learnings from building LLM judges: π― Clearly define evaluation metrics π¦Ύ Choose a strong LLM βοΈ Write a good evaluation prompt π
Evaluate the judge and update the prompt https://t.co/WhAYPIu5XJ
booking.ai
Lessons learned from 1 year of Judge-LLM Development
0
1
1
Just finished the module on agents and MCP in my new course I cover: - Function calling - Deep research on complete @DataTalksClub podcast transcript history - MCP server for @cursor_ai with @EvidentlyAI docs - Ton of examples with OpenAI Agents SDK and PydanticAI
0
12
46
π In case you missed it 250 LLM benchmarks! We updated the database of LLM benchmarks and datasets used to measure LLM capabilities in reasoning, math, coding, info retrieval, tool use, and safety. Save the list π https://t.co/nZjQF9ljF2
0
0
2
A Friday ML use case π π From the database of 500 ML & LLM systems: https://t.co/jJoUj6MfFZ How Yelp, an online reviews platform, uses LLMs to detect threats, harassment, lewdness, personal attacks, and hate speech. https://t.co/XkLbPGUedV
evidentlyai.com
How do top companies apply AI? A database of 650 case studies from 100+ companies with practical ML use cases, LLM applications, and learnings from designing ML and LLM systems.
0
0
2
Production is where machine learning meets business value, and Evidently AI has put together a comprehensive compendium of 650 real production ML/LLM case studies from 100+ companies (e.g., Netflix, Airbnb, DoorDash): https://t.co/TgNQOYTgmK
#ML #MachineLearning #AI
0
1
3
π In case you missed it RAG evaluation: an in-depth guide! You need a way to test how well your RAG system works β and catch what doesn't. Learn how to evaluate RAG retrieval and generation quality, build test sets, run experiments, and monitor π https://t.co/Jh2adbYeXo
evidentlyai.com
This guide breaks down how to evaluate and test RAG systems. You'll learn how to evaluate retrieval and generation quality, build test sets with synthetic data, run experiments, and monitor in...
0
0
0
πEvidently + Grafana for LLM evals! You can now visualize your Evidently LLM evaluation metrics on a Grafana dashboard. All in open source! Check out the code example: https://t.co/lGKSja0wnd
0
1
2
The code for the first module of my AI Bootcamp is ready! There I cover - LLMs and structured output - RAG with FAQ questions - RAG YouTube video + Summarizer - RAG on @EvidentlyAI docs - Search libraries: minsearch, elasticsearch, @qdrant_engine
0
1
9
π See how top companies design their AI systems We updated our database of 650 practical ML use cases, including real-world LLM and Gen AI applications, from 100+ companies. Enjoy the reading π https://t.co/Ipe0L1OfB8
0
0
0
π In case you missed it How to evaluate an LLM app? π An intro from LLM evals course: https://t.co/Go2bNYJXCR π‘ Prepare a dataset with test inputs βοΈ Manually label responses as Good or Bad π Design LLM evals system for automation Watch the video:
0
1
1
A Friday ML use case π π From the database of 500 ML & LLM systems: https://t.co/jJoUj6MfFZ How LinkedIn uses Skills Graph to extract skill data from texts and map the relationships between skills, people, and companies for relevant job matches. https://t.co/3FrKBbhYYj
linkedin.com
0
2
5
Want to see more examples of AI agents in production? We put together a database of 650 practical ML and LLM case studies from over 100 companies. Enjoy the reading! https://t.co/Ipe0L1OfB8 5/5 π§΅
evidentlyai.com
How do top companies apply AI? A database of 650 case studies from 100+ companies with practical ML use cases, LLM applications, and learnings from designing ML and LLM systems.
0
0
2