🏳️🌈👨🏻💻 Jan Oliver Rüdiger 🦄🌼🌸🏵️
@notesJOR
Followers
993
Following
2K
Media
521
Statuses
3K
#NLProc #TextMining #Linguistics #DataScience #DigitalHumanities #PostDoc - mother&father of: https://t.co/2gJOCo0vR6 - 🏭: @IDS_Mannheim
Düsseldorf & @IDS_Mannheim
Joined March 2014
Mein Blog ist unter: https://t.co/0OVSdbwgkD im Fediverse/Mastodon verfügbar. Wer mir aus dem Fediverse/Mastodon schreiben möchte, verwendet bitte den Account: @notesjor@fediscience.org über Bluesky: https://t.co/1GxmC7hF0o weitere soziale Netzwerke: https://t.co/aANnEByL4G
0
0
0
📢 Join us for another webinar in our series on MA & PG Cert programmes in Corpus Linguistics @LancasterUni We'll explore explore #CorpusLinguistics #DigitalHumanities🔍📊 2 April 2-3pm UK time. Register: https://t.co/mnhVV9Ka1O
0
11
17
Erinnerst du dich noch, wann du dich bei X registriert hast? Ich weiß es noch - eigentlich nie! #MeinXJubiläum
0
0
1
This might arise from costs for production/transmission – paper, storage, bandwidth. Keep in mind that we're studying complexity from the perspective of information theory. It would be interesting to see how these results align with other ways of measuring lang complexity. 6/6
0
0
0
So, is there any relation. to lang.-external variables? We found that larger communities tend to use more complex/efficient langs. We speculate that this might have to do with the importance of written communication in larger societies which favoring shorter messages. 5/6
1
0
0
A: it's a trade-off. Langs with higher complexity tend to produce shorter texts, i.e. are more efficient in communication. That means that a complex lang might offer more options to convey the same idea using fewer symbols. We also show that this is not a trivial relationship.4/6
1
0
0
If one lang is harder to process for an LM than another, this relationship holds across other LMs, text types, and even across symbolic levels (chars, words, BPE). But why would some langs evolve to be more complex, given the increased processing effort? 3/6
1
0
0
We trained 4 types of LMs (PPM, PAQ, LSTM, Transformers) on a corpus of 3 bn words across >6500 docs in >2k langs. Entropy rate distributions over LMs were surprisingly consistent. So, in this context, the choice of LM does not have a big impact on cross-linguistic studies. 2/6
1
0
0
Published: "Human #languages trade off #complexity against #efficiency" in PLOS Complex Systems https://t.co/icN7Z3UqGQ Langs that are (information-theoretically) more complex are more efficient. Larger speaker communities tend to use more efficient langs. 1/6 #linguistics
1
0
2
Soeben erschienen: Handbuch Daten und KI im Journalismus, hrsg. von Christina Elmer @ChElm und Lorenz Matzat @lorz . Reihe Praktischer Journalismus #datenjournalismus #KI #neuerscheinung #journalismus
https://t.co/rQ4KPJu6aq
0
3
3
If you are based in Leipzig, feel free to join us for our "Digital Methods" workshop next week. The event is part of the MECANO EU project ("mechanics of canon formation"):
mecano-dn.eu
MECANO: Mechanics of Canon Formation and the Transmission of Knowledge from Greco-Roman Antiquity
0
3
11
Kürzlich erschienen: Deutsche Sprache. Zeitschrift für Theorie, Praxis & Dokumentation, Heft 3/24. Themenheft „Deutsch im Kontakt: Europäische & außereurop. Konstellationen im Vergleich“. Hg.: Barbara Hans-Bianchi & Doris Stolberg. Berlin: @ESVmedien 🔗 https://t.co/RYr6kjxrbM
2
2
9
DoReCo version 2.0 has been released! The corpus contains annotated language data from 53 low-resource and endangered languages. It is particularly suitable for cross-linguistic research on phonetics, phonology and morphology. More infos: https://t.co/lp7caKqhpd
0
9
22
Exciting job offer from BAdW Munich to work on the cutting edge of digital lexicography, with potential long-term perspective. Submit applications until 19 Jan 2025!
0
3
9
Out now: Tolles #OpenAccess Themenheft zu #Quantifizierung von Katja Politt @lingucat, mit einem Beitrag von Jakob Neels, @ungerer_tobias und mir sowie einer Ausleitung (sic) von @AlexWillich!
0
2
9
Mehrere IDS-Forschende sind am bald erscheinenden Sammelband "Genderbezogene Personenreferenzen" beteiligt: Wie entwickeln sich Personenref. in Neujahrs- & Weihnachtsansprachen? Mit welchem Genus wird auf Werke von Künstlerinnen Bezug genommen? Vormerken➡️ https://t.co/6xU4fiR7vS
1
2
7
Our new working paper on #TikTok and #AfD is out. We present a data collection approach (algorithm audit) and apply it to German regional elections 1/3 on the main findings.... https://t.co/gKB4IqBaaU
4
30
102
Das wunderbare #Trafilatura ❤️💕 erblickt Version 2.0.0 - Ein tolles und unersetzliches Tool zum #WebScraping. Gratulation und Dank an @adbarbaresi
https://t.co/xGXp59lh9O
github.com
Breaking changes: Python 3.6 and 3.7 deprecated (#709) bare_extraction(): now returns an instance of the Document class by default as_dict deprecation warning → use .as_dict() method on return va...
0
3
6