Open Semantic Search
@OpenSemSearch
Followers
2K
Following
261
Media
136
Statuses
1K
#Semantic #search engine (#opensource) to search & analyze document sets, archives & news (exploratory search, #textmining, #nlp, #annotation, #ocr, #ddj, #dh)
Karlsruhe, Germany
Joined January 2013
Free #OpenSource research tools & tutorials for search, analysis, annotation, structure & textmining of large document collections, archives & leaks on your own laptop or server: https://t.co/EqV2Hx3eJs
#NICAR #NICAR22 #NICAR2022 #ddj #datajournalism #investigative #journalism
0
15
30
Some scientists & librarians had documents (f.e. "Sammelband") / digitized books for which named entity recognition failed because of default limit (one million chars) of NER lib. Thanks to https://t.co/Kdg5fyM7Tv from Tilburg University for new config option to extend the limit!
github.com
wsldankers has 40 repositories available. Follow their code on GitHub.
0
1
4
Upgraded text extraction to new Apache Tika release 2.5.0:
0
3
3
Want rendering in addition to extracted text and metadata? Please contribute to the design!
0
2
3
Imagine having 1000 PDFs and needing to find those with specific keywords. Here’s a tool many, many journalists need that someone could easily write and share. 🔍🗞 - Take PDFs as input - Convert to text - Search for keywords (UTF-8) - Output result as CSV
73
64
415
@runasand Open Semantic Search does exactly this and more. A lot more. It's an amazing tool. https://t.co/YBpWUTP0j5
github.com
Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL f...
1
6
28
Heute gaben @acka47 und @fsteeg im Rahmen eines Kolloquiums der Professur Wirtschafts- und Sozialgeschichte an der @UniHalle eine Präsentation "Integration externer Normdatenquellen für Abgleich & Anreicherung lokaler Daten in #OpenRefine". https://t.co/Wz6L4JXs8P
#reconciliation
0
6
14
Hier noch ein kurzfristiger Veranstaltungshinweis für #wikibase und #linkedData Interessierte "NFDI-InfraTalk: Wikibase and the challenges and possibilities of knowledge graphs for RDM in NFDI4Culture" Heute, 7.3.2022, 16 Uhr im Live Stream auf https://t.co/ub241O7CBC
#nfdi
0
4
4
Graphs of architecture documentation now in #Mermaid format ( https://t.co/fgVNntMv8L) inside documentation (markdown) inside the Git(hub) repo, so all can edit them like other parts of the docs in https://t.co/yxlJ1riQGH - Tnx to #opensource #MkDocs plugin https://t.co/oMSuJ3YqIW
0
10
20
Working on more prioritization features for the task queue for better chance to process more relevant documents earlier. If you want to contribute on setting priorities for filename extensions or relevant keywords which often occure in filenames: https://t.co/6sXxbvNhYH
#ddj
github.com
Like the yet implemented user interface for prioritization of certain files in ETL task queue which are yet not extracted and planed prioritization of whole file directories and subdirectories we w...
0
3
3
New #OpenSource release (beta!) of #Open #Semantic #Search Server for teams with many upgrades (Tika 2, Solr 8, spaCy NLP 3 & Flower task monitoring out of the box) available for download (package for Debian 11): https://t.co/EqV2Hx3eJs
#ddj #datajournalism #dh #digitalhumanities
0
10
12
New #OpenSource release (beta!) of #Open Semantic Desktop #Search VM with many upgrades (Debian 11 Bullseye, Apache Tika 2, Apache Solr 8, spaCy NLP 3 ...) available for download: https://t.co/gKP0QHISoz
#ddj #datajournalism #dh #digitalhumanities
0
11
14
Migrated build of the Open Semantic Desktop Search VM (Virtual Box appliance) to Ansible:
0
2
4
Most #opensource contributors of #Open Semantic Search not listed on Github user interface as "Contributors" because our repo is structured by Git submodules (additional git repos). Added a section "Contributors" to https://t.co/kTZtzhkdh8 (feel free to extend). Thanks to all!
0
4
8
Next Open Source release of Open Semantic Search Server with automatic setup of Celery Flower (web user interface) for monitoring of the document processing task queue (ETL) out of the box.
0
3
5
Working on #spaCy NER plugin to run Named-entity recognition (NER) by multiple different #MachineLearning models for same document language (currently "only" one #ml model per language configurable) to fill faceted search/interactive filters. #opensource #textmining #nlp #nlproc
1
2
5
Thanks to meanwhile 600 followers @github and for 500 stars for the #Open #Semantic #Search git Repository https://t.co/ykgB3FrnPn
#opensource #ddj #dh #OpenScience
github.com
Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL f...
0
6
13