Scott Enderle @scottenderle X Profile

Scott Enderle

@scottenderle

Followers

617

Following

7K

Media

158

Statuses

6K

DH at Penn Libraries. Increasingly stealthy. He/him, opinions mine, everything's a bookmark.

https://t.co/Y69iIxOuPx

Philadelphia, PA

Joined July 2009

Don't wanna be here? Send us removal request.

Scott Enderle

@scottenderle

5 years

You have a dimension reduction problem and two solutions. One is simpler mathematically, but harder to explain. The other is more complex mathematically, but easier to explain. They work equally well. Which do you go with?

1

0

Maria Antoniak

@maria_antoniak

5 years

I've updated little-mallet-wrapper to output the MALLET diagnostics file (includes coherence) and the full word weight distributions for each topic. You can load the word weights and also compare pairs of topics using Jensen-Shannon divergence. https://t.co/oMNgIbhjm8

github.com

A Python wrapper around the topic modeling functions of MALLET. - maria-antoniak/little-mallet-wrapper

0

12

58

Scott Enderle

@scottenderle

5 years

"As the black hole expanded along Spruce street, swallowing streetcars and Amazon delivery trucks whole, the Administrators realized the depth of their folly."

0

6

Scott Enderle

@scottenderle

5 years

Huh, more Fourier transforms. Overlaps in interesting ways with our HathiTrust ACS project. https://t.co/UGEON1oMrm https://t.co/jiqDoGX3Iv

syncedreview.com

Transformer architectures have come to dominate the natural language processing (NLP) field since their 2017 introduction. One of the only limitations to transformer application is the huge computa...

0

3

Scott Enderle

@scottenderle

5 years

If you have not already discovered Gutenberg, dammit, have a look, it's great! Really excellent for students and anybody who wants to play around with Gutenberg texts in a low-bar-to-entry way.

github.com

I wanted all of plaintext Project Gutenberg in an easy-to-use format, so I made this - aparrish/gutenberg-dammit

1

6

37

sarah jeong

@sarahjeong

5 years

something I didn't know until I went to law school(!!!!!!!) was that universal daycare was a popular — sometimes mainstream — feminist demand in the 1960s and 1970s. for all we talk about women empowerment, the arc of history, and so on, there was a giant leap back in the culture

69

1K

8K

Scott Enderle

@scottenderle

5 years

This thread is a good reminder that stopword lists are a form of feature selection. But "stopword list creation" sounds way less important and serious and frowny than "feature selection," doesn't it?

Melanie Walsh

@mellymeldubs

5 years

Here are some words that scikit-learn, the popular Python machine learning library, gets rid of by default (stopwords): - fire - cry - system - serious - empty - thick - thin - whole - describe - detail What the heck these are good words!

0

5

David McClure

@clured

5 years

Playing with the C4 corpus from @ai2_allennlp. Here are 1M occurrences of the words "red" and "blue" (500k each), embedded via DistilBERT, where the words are [MASK]'ed in the input sequences, and then the mask embedding is sliced out of the top layer. Then UMAP to 2d.

6

14

74

Scott Enderle

@scottenderle

5 years

Sympathetic with people's feeling that "bias" is too flawed, or too polysemous, or too loaded a term to be useful. But do we actually have any better terms for discussing the issue of—should I call it fairness?—in algorithms?

1

0

Deb Raji

@rajiinio

5 years

These are the four most popular misconceptions people have about race & gender bias in algorithms. I'm wary of wading into this conversation again, but it's important to acknowledge the research that refutes each point, despite it feeling counter-intuitive. Let me clarify.👇🏾

Dr Kareem Carr

@kareem_carr

5 years

FOUR things to know about race and gender bias in algorithms: 1. The bias starts in the data 2. The algorithms don't create the bias but they do transmit it 3. There are a huge number of other biases. Race and gender bias are just the most obvious 4. It's fixable! 🧵👇

26

1K

3K

Scott Enderle

@scottenderle

5 years

This is quite good.

Dr Kareem Carr

@kareem_carr

5 years

how it really works

0

Scott Enderle

@scottenderle

5 years

Wow, UMAP does metric learning now. Seems like it could be a really powerful tool for developing interpretable predictive models.

0

4

Ming Jiang

@SeleenaJiang

5 years

We've developed a Gutenberg-HathiTrust parallel corpus of 19,049 pairs uncorrected OCR + human-proofread books in 6 domains, publ. 1780-1993. Description: https://t.co/9efe199fh2 @hathitresearch @Ted_Underwood @profdownie @gworthey @miehumie

2

33

103

Scott Enderle

@scottenderle

5 years

When you throw vectors of LDA topics haphazardly at UMAP and get these triangle looking things — is it somehow recovering the shape of the Dirichlet prior?

1

0

4

Wenyi Shang

@ShangWenyi

5 years

Going to present the work "Improving Measures of Text Reuse in English Poetry: A TF–IDF Based Method" co-authored with @Ted_Underwood at #iconference2021 on Wednesday. We validated the method through the example of text reuse between Yeats and the English Romantic poets.

1

10

45

Ryan Heuser / @heuser.bsky

@quadrismegistus

5 years

It's not a bug or typo either. I don't know the text (a short story collection) but it's a bizarre, fascinating passage. Immediately after the 79 repetitions of "butter": "Eugenie Grandet decides to kill her father."

3

2

13

Guy Shrubsole

@guyshrubsole

5 years

Few people realise that this country has fragments of a globally rare habitat: temperate rainforest. Can you help me map the lost rainforests of England? 👇A brief thread about my new side-project: https://t.co/qKwhJscV8U

103

607

2K