Explore tweets tagged as #Datasets
@spatialthoughts
Ujaval Gandhi
10 days
Exciting news for #QGIS users! The "Google Earth Engine Plugin for QGIS" is now updated with new no-code tools that allow you to download and use #EarthEngine datasets in QGIS easily. Check out my newly contributed tutorials for the latest plugin (1/n) 👇
Tweet media one
Tweet media two
15
232
1K
@eliebakouch
elie
7 days
This is how you create the best open dataset for VLM
Tweet media one
4
57
583
@RayanChikhi
Rayan Chikhi
8 days
🌎👩‍🔬 For 15+ years biology has accumulated petabytes (million gigabytes) of🧬DNA sequencing data🧬 from the far reaches of our planet.🦠🍄🌵 Logan now democratizes efficient access to the world’s most comprehensive genetics dataset. Free and open. https://t.co/dDBtAjfdYL
Tweet media one
5
146
368
@DatawithPooja
Pooja Pawar, PhD
5 days
Data cleaning is where true #DataScience begins From handling missing values to filtering, aggregation & merging datasets—these Python commands are your go-to for making raw data analysis-ready! #Python #Pandas #DataAnalytics #MachineLearning #BigData #EDA
Tweet media one
6
105
511
@CausalHuber
Martin Huber
2 days
📚 In summer 2023, my book Causal Analysis was published with @mitpress. Just two years later😉 I’m very happy to share that the lecture slides are now freely available in both PDF and LaTeX (as zip files), along with the datasets and R/Python code: 👉 https://t.co/VfahR3aqVR
Tweet media one
11
338
2K
@AdetanChelsea
The Data Magician📈🪄
6 days
I just completed the Data Engineer in Python track on @DataCamp and built my first ETL pipeline for a retail dataset alongside!🥳 You can check out the project using this link: https://t.co/iuER47CGke If you're also transitioning into DE, let's connectttt☺️
Tweet media one
Tweet media two
18
12
162
@yonathandinata
Yonathan Dinata
9 days
TOP 20 Indonesian Crypto Influencers (2025) Based on the last 30 days, since @grok has a hard time processing too many datasets.
Tweet media one
Tweet media two
79
84
496
@Fish_Fetisher
Emily Troyer 🐠🐟🐡
10 days
Excited to announce that the 1st paper from my postdoc is now out in @CurrentBiology Using a large dataset of 3D preserved fossils, we explore the diversification of jaws in early bony fishes. 1/15 https://t.co/qUteZPnrRr
Tweet media one
3
32
160
@jdhruv14
Dhruv
4 days
Guysssss, I've completed making the Srimad Bhagavad Gita Dataset, do use it for making something beneficial for the mankind and suggest me any ideas that can be implemented and tell me if you find any mistake I might have made. Om Namo Bhagavate Vasudevaya 🙏🙏
Tweet media one
91
273
3K
@lusxvr
Luis
7 days
Today, we are releasing FineVision, a huge open-source dataset for training state-of-the-art Vision-Language Models: > 17.3M images > 24.3M samples > 88.9M turns > 9.5B answer tokens Here are my favourite findings:
Tweet media one
19
214
1K
@MaziyarPanahi
Maziyar PANAHI
3 days
Introducing MultiCaRe, open-source, multimodal clinical case datasets on @HuggingFace by @OpenMed_AI Community. Public and ready for load_dataset. Images: 160K+ figures/subimages Cases: 85K de-identified narratives + demographics Articles: 85K metadata + abstracts 🧵 (1/7)
Tweet media one
19
162
745
@victor_explore
Victor
1 day
Free playlist of 23 hands-on Python Pandas project tutorials including e-commerce analysis, movie datasets, health data, and building web apps with Streamlit. Perfect for building a strong data analysis portfolio with real-world case studies.
Tweet media one
2
52
341
@andimarafioti
Andi Marafioti
7 days
Fuck it. Today, we open source FineVision: the finest curation of datasets for VLMs, over 200 sources! > 20% improvement across 10 benchmarks > 17M unique images > 10B answer tokens > New capabilities: GUI navigation, pointing, counting FineVision 10x’s open-source VLMs.
Tweet media one
22
111
920
@yohaniddawela
Yohan
4 days
Geospatial traffic data is incredibly tough to find. To make things easier for you, I've compiled a comprehensive list of traffic and mobility datasets:
Tweet media one
5
77
617
@tom_doerr
Tom Dörr
4 days
curated list of datasets and tools for LLM post-training
Tweet media one
1
6
29
@DarkWebInformer
Dark Web Informer
10 days
🚨🇲🇽 Alleged Sale of 23,000 Mexican Credit Card Records A known threat actor has allegedly listed a dataset of 23,000 credit card records from Mexico advertised with ~70% validity. 📌 Key Details • Threat Actor: Mexicnon • Network: Dark Web • Format: Fullz (CC, Exp, CVV,
Tweet media one
2
6
18
@LuizaJarovsky
Luiza Jarovsky, PhD
4 days
🚨 Real footage showing AI companies trying to remove personal data from the AI training dataset to avoid GDPR compliance. Watch:
19
57
532
@IT_unhinged
IT Memes
10 days
Awarded to whoever thought Excel could handle a 2GB dataset.
Tweet media one
2
2
39
@ChinaScience
China Science
7 days
A Chinese-led international team has developed EyeFM, an AI system trained on 14.5 million ocular images and paired clinical texts from global, multi-ethnic datasets. As the world's first multimodal vision–language eye imaging foundation model, it has demonstrated how AI can soon
Tweet media one
1
34
100