George Ho
@_eigenfoo
Followers
1K
Following
10K
Media
125
Statuses
1K
Natural language processing, Bayesian modeling, open source, crosswords, donuts and coffee. Currently ML at @flatironhealth (he/him/his)
NYC
Joined May 2017
0
0
1
It was hard to find quality OCR data... until today! Super excited to announce the release of the 2 largest public OCR datasets ever π π OCR is critical for document AI: here, 26M+ pages, 18b text tokens, 6TB! Thanks to @ucsf_library, @industrydocs and @PDFAssociation π§Ά β
7
101
603
Also from the group chat today Wordle 934 3/6* β¬β¬π¨π©β¬ π©π©β¬π©π© π©π©π©π©π©
0
0
0
My NYT word game group chat has just come up with a new idea: play Wordle, get your score, and then prompt an image generation AI to draw a picture of what you see in your score. I'll go first. Wordle 934 6/6 β¬π¦β¬β¬β¬ β¬π¦β¬β¬π¦ π¦π¦π¦π¦β¬ π§π§β¬π¦π¦ π§π§β¬π§π§ π§π§π§π§π§
1
1
3
Very excited to introduce DocLLM, a multimodal LLM developed by my colleagues @jpmorgan. DocLLM-7B outperforms other SotA LLMs on 12/16 benchmarks within four core Document AI tasks! Incredibly proud of the team for their hard work. Check it out at https://t.co/BNHo1ia8d5
JPMorgan announces DocLLM A layout-aware generative language model for multimodal document understanding paper page: https://t.co/xbEslNa82b Enterprise documents such as forms, invoices, receipts, reports, contracts, and other similar records, often carry rich semantics at the
7
28
112
JPMorgan announces DocLLM A layout-aware generative language model for multimodal document understanding paper page: https://t.co/xbEslNa82b Enterprise documents such as forms, invoices, receipts, reports, contracts, and other similar records, often carry rich semantics at the
24
346
2K
I sawed my copy of the power broker in half so that itβs easier to carry around When a bookβs size becomes an impediment to reading it, I feel like somethingβs gone seriously wrong
0
0
7
Hi yes hello good morning I was on a podcast, talking about crossword archivism and milk cartons You can listen to it here:
0
0
0
Gerty and Carl Cori won the Nobel Prize together in 1947. Then 6 of their students won Nobel Prizes, all in physiology/medicine and chemistry. (Five separate prizes in total; one was shared.)
amazon.com
"Crucible of Science" is the story of a unique laboratory at Washington University in St. Louis, and of Carl and Gerty Cori, the biochemists who established it. Carl and Gerty met and married at...
14
60
695
Beyond ecstatic for our Cooper Brue team from @cooperunion for winning both best beer label and 3rd place overall in the annual beer brewing competition at AIChE. Go team and thanks Ana for helping us compete! And yes, the poster is hand drawn!
2
3
20
i would retire too if i had to rewrite the entire HuggingFace Trainer to work with HuggingFace Accelerate, jesus that must have been a nightmare
Yesterday was my last day at Hugging Face. The past three years have been exhilarating and I am very proud of what the team has accomplished during that time! Taking a bit of a break with opensource full time (though I will still contribute to Transformers and Accelerate)
4
4
116
Hello, long time no #crossword! A new #cryptic is up, and Iβm pretty happy with it! My favorite clue: I'm about to stuff fruit with trace of radium β it might bring death (4,6) https://t.co/PawHGrWrFY
georgeho.org
I'm about to stuff fruit with trace of radium β it might bring death (4,6)
1
2
8
#ML can extract clinically relevant information from EHRs at scale, but evaluating its quality has focused on single variables. This @flatironhealth study aims to evaluating ML's usefulness for research & RWE generation at scale: https://t.co/iTpSf2JhbX
@Cancers_MDPI
resources.flatiron.com
This study aims to evaluate the quality and performance of ML-extracted RWD and its use for research and RWE generation at scale.
1
2
5
@flatironhealth @zachweinberg Big reveal of Flatiron Health #machinelearning with #language and documents in EHR. The full text explainer from our team is here: https://t.co/BJzzxIatPX
0
1
2
Extracting meaningful clinical detail from EHRs for millions of patients with cancer is challenging. @FlatironHealth uses #NLP & #ML to extract key information from unstructured documents in the curation of high quality #RWD. Read more on our approach:
resources.flatiron.com
NLP and ML enable the extraction of retrospective clinical data in EHR with speed and scalability to help researchers learn from the experience of every person with cancer.
1
5
16
Today I capitulated and finally learnt how to save places on Google Maps and I think this is about to change my life Maybe my hyperfixated techie friends know a thing or two about using technology to improve lives after all
0
0
2