notesJOR Profile Banner
🏳️‍🌈👨🏻‍💻 Jan Oliver Rüdiger 🦄🌼🌸🏵️ Profile
🏳️‍🌈👨🏻‍💻 Jan Oliver Rüdiger 🦄🌼🌸🏵️

@notesJOR

Followers
993
Following
2K
Media
521
Statuses
3K

#NLProc #TextMining #Linguistics #DataScience #DigitalHumanities #PostDoc - mother&father of: https://t.co/2gJOCo0vR6 - 🏭: @IDS_Mannheim

Düsseldorf & @IDS_Mannheim
Joined March 2014
Don't wanna be here? Send us removal request.
@notesJOR
🏳️‍🌈👨🏻‍💻 Jan Oliver Rüdiger 🦄🌼🌸🏵️
3 months
Mein Blog ist unter: https://t.co/0OVSdbwgkD im Fediverse/Mastodon verfügbar. Wer mir aus dem Fediverse/Mastodon schreiben möchte, verwendet bitte den Account: @notesjor@fediscience.org über Bluesky: https://t.co/1GxmC7hF0o weitere soziale Netzwerke: https://t.co/aANnEByL4G
0
0
0
@notesJOR
🏳️‍🌈👨🏻‍💻 Jan Oliver Rüdiger 🦄🌼🌸🏵️
9 months
#Düsseldorf #wochenende
0
0
1
@vaclavbrezina
Vaclav Brezina
9 months
📢 Join us for another webinar in our series on MA & PG Cert programmes in Corpus Linguistics @LancasterUni We'll explore explore #CorpusLinguistics #DigitalHumanities🔍📊 2 April 2-3pm UK time. Register: https://t.co/mnhVV9Ka1O
0
11
17
@notesJOR
🏳️‍🌈👨🏻‍💻 Jan Oliver Rüdiger 🦄🌼🌸🏵️
10 months
Erinnerst du dich noch, wann du dich bei X registriert hast? Ich weiß es noch - eigentlich nie! #MeinXJubiläum
0
0
1
@notesJOR
🏳️‍🌈👨🏻‍💻 Jan Oliver Rüdiger 🦄🌼🌸🏵️
11 months
This might arise from costs for production/transmission – paper, storage, bandwidth. Keep in mind that we're studying complexity from the perspective of information theory. It would be interesting to see how these results align with other ways of measuring lang complexity. 6/6
0
0
0
@notesJOR
🏳️‍🌈👨🏻‍💻 Jan Oliver Rüdiger 🦄🌼🌸🏵️
11 months
So, is there any relation. to lang.-external variables? We found that larger communities tend to use more complex/efficient langs. We speculate that this might have to do with the importance of written communication in larger societies which favoring shorter messages. 5/6
1
0
0
@notesJOR
🏳️‍🌈👨🏻‍💻 Jan Oliver Rüdiger 🦄🌼🌸🏵️
11 months
A: it's a trade-off. Langs with higher complexity tend to produce shorter texts, i.e. are more efficient in communication. That means that a complex lang might offer more options to convey the same idea using fewer symbols. We also show that this is not a trivial relationship.4/6
1
0
0
@notesJOR
🏳️‍🌈👨🏻‍💻 Jan Oliver Rüdiger 🦄🌼🌸🏵️
11 months
If one lang is harder to process for an LM than another, this relationship holds across other LMs, text types, and even across symbolic levels (chars, words, BPE). But why would some langs evolve to be more complex, given the increased processing effort? 3/6
1
0
0
@notesJOR
🏳️‍🌈👨🏻‍💻 Jan Oliver Rüdiger 🦄🌼🌸🏵️
11 months
We trained 4 types of LMs (PPM, PAQ, LSTM, Transformers) on a corpus of 3 bn words across >6500 docs in >2k langs. Entropy rate distributions over LMs were surprisingly consistent. So, in this context, the choice of LM does not have a big impact on cross-linguistic studies. 2/6
1
0
0
@notesJOR
🏳️‍🌈👨🏻‍💻 Jan Oliver Rüdiger 🦄🌼🌸🏵️
11 months
Published: "Human #languages trade off #complexity against #efficiency" in PLOS Complex Systems https://t.co/icN7Z3UqGQ Langs that are (information-theoretically) more complex are more efficient. Larger speaker communities tend to use more efficient langs. 1/6 #linguistics
1
0
2
@halemverlag
Herbert von Halem Verlag
1 year
Soeben erschienen: Handbuch Daten und KI im Journalismus, hrsg. von Christina Elmer @ChElm und Lorenz Matzat @lorz . Reihe Praktischer Journalismus #datenjournalismus #KI #neuerscheinung #journalismus https://t.co/rQ4KPJu6aq
0
3
3
@notesJOR
🏳️‍🌈👨🏻‍💻 Jan Oliver Rüdiger 🦄🌼🌸🏵️
1 year
😂🤣😂 Was für eine Zusammenfassung
0
0
2
@8urghardt
Manuel Burghardt
1 year
If you are based in Leipzig, feel free to join us for our "Digital Methods" workshop next week. The event is part of the MECANO EU project ("mechanics of canon formation"):
Tweet card summary image
mecano-dn.eu
MECANO: Mechanics of Canon Formation and the Transmission of Knowledge from Greco-Roman Antiquity
0
3
11
@IDS_Mannheim
IDS Mannheim
1 year
Kürzlich erschienen: Deutsche Sprache. Zeitschrift für Theorie, Praxis & Dokumentation, Heft 3/24. Themenheft „Deutsch im Kontakt: Europäische & außereurop. Konstellationen im Vergleich“. Hg.: Barbara Hans-Bianchi & Doris Stolberg. Berlin: @ESVmedien 🔗 https://t.co/RYr6kjxrbM
2
2
9
@ZASBerlin
ZAS Berlin
1 year
DoReCo version 2.0 has been released! The corpus contains annotated language data from 53 low-resource and endangered languages. It is particularly suitable for cross-linguistic research on phonetics, phonology and morphology. More infos: https://t.co/lp7caKqhpd
0
9
22
@schtepf
Stephanie Evert
1 year
Exciting job offer from BAdW Munich to work on the cutting edge of digital lexicography, with potential long-term perspective. Submit applications until 19 Jan 2025!
0
3
9
@hartmast
Stefan Hartmann
1 year
Out now: Tolles #OpenAccess Themenheft zu #Quantifizierung von Katja Politt @lingucat, mit einem Beitrag von Jakob Neels, @ungerer_tobias und mir sowie einer Ausleitung (sic) von @AlexWillich!
0
2
9
@IDS_Mannheim
IDS Mannheim
1 year
Mehrere IDS-Forschende sind am bald erscheinenden Sammelband "Genderbezogene Personenreferenzen" beteiligt: Wie entwickeln sich Personenref. in Neujahrs- & Weihnachtsansprachen? Mit welchem Genus wird auf Werke von Künstlerinnen Bezug genommen? Vormerken➡️ https://t.co/6xU4fiR7vS
1
2
7
@JasperTjaden
Jasper Tjaden
1 year
Our new working paper on #TikTok and #AfD is out. We present a data collection approach (algorithm audit) and apply it to German regional elections 1/3 on the main findings.... https://t.co/gKB4IqBaaU
4
30
102
@notesJOR
🏳️‍🌈👨🏻‍💻 Jan Oliver Rüdiger 🦄🌼🌸🏵️
1 year
Das wunderbare #Trafilatura ❤️💕 erblickt Version 2.0.0 - Ein tolles und unersetzliches Tool zum #WebScraping. Gratulation und Dank an @adbarbaresi https://t.co/xGXp59lh9O
Tweet card summary image
github.com
Breaking changes: Python 3.6 and 3.7 deprecated (#709) bare_extraction(): now returns an instance of the Document class by default as_dict deprecation warning → use .as_dict() method on return va...
0
3
6