Jan P. Harries Profile Banner
Jan P. Harries Profile
Jan P. Harries

@jphme

Followers
1,185
Following
304
Media
119
Statuses
893

Co-Founder & CEO @ ellamind / #DiscoResearch / Retweets&favs are stuff i find interesting, not endorsements

Düsseldorf, Germany
Joined March 2009
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
Pinned Tweet
@jphme
Jan P. Harries
2 months
Live tweeting the most interesting insights from @Meta ´s new Llama3 paper 1. How did the arrive at a 405b model trained with ~15T tokens? "Extrapolation of the resulting scaling law to 3.8 × 1025 FLOPs suggests training a 402B parameter model on 16.55T tokens." 👇🧵
Tweet media one
3
88
859
@jphme
Jan P. Harries
10 months
First benchmark results for #Mixtral 7bx8 are in and look awesome (by @bjoern_pl ), MMLU 0.717, AVG 0.688 for the untuned base model. 🤯 And @Tim_Dettmers himself dropped some code and suggested we can get it down to 4GB. Let's see if it runs on my phone before neurips has
Tweet media one
Tweet media two
22
95
794
@jphme
Jan P. Harries
10 months
About 24h after the link was dropped, we proudly present DiscoLM Mixtral 8x7b alpha. Created in an awesome 24h-speedrun by @bjoern_pl . Many thanks to @MistralAI for the model, @Hessian_AI for providing compute and @laion_ai for invaluable support! 🧵1/x
Tweet media one
12
43
307
@jphme
Jan P. Harries
10 months
Happy to announce the formation of #DiscoResearch with the release of DiscoLM 120b and DiscoLM 70b, two experimental models pushing the boundaries of OS LLMs 🪩. These models would currently reach #2 and #3 of the HF LLM Leaderboard (see benchmark results below and in the model
Tweet media one
Tweet media two
6
11
74
@jphme
Jan P. Harries
10 months
@bjoern_pl @Tim_Dettmers Link to experimental HF implementation: Link to bitsandbytes WiP implementation: Link to Disco(rd):
0
5
49
@jphme
Jan P. Harries
10 months
Get the model here: Many thanks also in particular to @dzhulgakov , @vikhyatk for key contributions, @AlpinDale , @winglian , @Tim_Dettmers , Casper, @TheBlokeAI and many others I surely forgot for dropping by and supporting the effort! 2/x
1
6
40
@jphme
Jan P. Harries
2 years
@fragdenstaat @HumboldtUni In diesem Fall liegt ihr daneben."Eigene Infrastruktur" in vergleichbarer Qualität ist für das Geld nicht ansatzweise zu machen und Zoom war/ist(?) für Lehre viel besser als Alternativen.Datenschutz ist wichtig, sollte aber nicht als Totschlagargument die Lehre beeinträchtigen.
5
0
32
@jphme
Jan P. Harries
1 year
Großen Respekt für Robert #Habeck .Unabhängig davon,ob das Heizungsgesetz gut ist oder nicht und wie man zu den Grünen steht- Politiker, die sich stellen und die sich ohne Rücksicht auf persönliche Popularität für das Land und Veränderungen einsetzen,gibt es viel zu wenig #Lanz
1
1
34
@jphme
Jan P. Harries
3 years
@tagesthemen Was für ein uninformierter Kommentar, Frau Kyrieleis fällt voll auf die Immobilien-Lobby herein. Neubauten waren ausgenommen (und helfen höchstens langfristig); der einzige Weg, Mieten bezahlbar zu halten, ist Immobilien als Geldanlage unattraktiv zu machen.Stichwort rent-seeking
5
0
25
@jphme
Jan P. Harries
1 year
@CarloMasala1 Nach Nord Stream hätte ich gedacht,dass Sie da differenzierter denken. Russland/Putin ist nicht dumm. Ru wird die Krim niemals freiwillig aufgeben. Daraus folgt,dass "verbrannte Erde" unmöglich eine (intendierte) Strategie sein kann (wasser). Wenn es wrkl Ru war, dann ein unfall?
18
0
28
@jphme
Jan P. Harries
1 year
The guy down there can only be @Teknium1 (but I wonder why reddit,not discord) :p. Congrats, well deserved (also @erhartford @jon_durbin @TheBlokeAI @jeremyphoward & all other recipients,really an all-star team). Great initiative by @a16z -see for details
Tweet media one
2
2
26
@jphme
Jan P. Harries
1 year
Delighted to release EM German, a state-of-the-Art open & free german-speaking LLM, finetuned on top of Llama2, @MistralAI and LeoLM. Find all information, examples & downloads on Github: Many thanks @TheBlokeAI @winglian @jon_durbin @laion_ai @hessianai
Tweet media one
4
5
26
@jphme
Jan P. Harries
10 months
Attached some very early example prompts. Evals that have finished so far are ARC (67.32), TruthfulQA (54.17), Winogrande (80.72); will update as soon as everything has finished. Thanks to everyone who contributed or just dropped by, it was a lot of fun for all of us 🪩🤩. 3/x
Tweet media one
Tweet media two
2
3
19
@jphme
Jan P. Harries
1 year
. @MistralAI just released their first 7b model and its huge. They claim to beat Llama2 13b on all relevant benchmarks. On my (german) micro benchmark, the 7bn model reaches Llama2 70b quality 🤩. Will release a first finetuned German Model soon. Links to the announcement below!
Tweet media one
Tweet media two
1
3
19
@jphme
Jan P. Harries
10 months
@abacaj Apparently there is still a bug in the forward pass - (applying softmax before routing to experts). See e.g. Nous discord or our DiscoResearch discord, people are working on a fix (and the HF repo will be updated soon)
3
0
18
@jphme
Jan P. Harries
8 months
@vietdle @dominik_scherm @AITinkerers @basti_vkl @charlykobiella @cdtm_munich @lafamigliaVC @con5di AI founders: Talk to this guy, best AI VC in Europe and he is really going the last mile for founders&tinkerers 😉! (Huge opportunity for AI robotics startups to get funded?) Many thanks @vietdle for your invaluable advice once again!
Tweet media one
3
1
14
@jphme
Jan P. Harries
8 months
Great crowd + long queues at @ai_tinkerers in Munich yday! I had the honor to launch DiscoLM German 7b by @Discoresearch live on stage. Many thanks to @dominik_scherm @basti_vkl @vietlede and sponsors @lafamigliaVC @cdtm_munich for bringing the European AI community together!
Tweet media one
Tweet media two
3
2
13
@jphme
Jan P. Harries
9 months
Agree. And I think it's very irresponsible pillorying @laion_ai or @huggingface for publicity instead of disclosing responsibly and working with them to mitigate the issues - will lead to a worse outcome for all parties and even more secrecy and less open data 😔 ..
@art_zucker
Arthur Zucker
9 months
It’s nice and all to bash @laion_ai while we don’t even know what @OpenAI , @Google and close-source models are trained on… 🙄
6
9
96
1
1
14
@jphme
Jan P. Harries
2 years
Ein sehr kluger Artikel von @PeterRNeumann in @derspiegel - man muss den Krieg vom Ende her denken. Passiert leider in den Medien viel zu wenig, wo oft nur Maximalpositionen augetauscht werden.
3
4
13
@jphme
Jan P. Harries
10 months
Great Intro to the one and only finetuning lib #Axolotl (and LLM finetuning in general) - if you´re technical and into AI, you can't spend an hour much better than listening to @winglian and @swyx ; highly recommended. (and even a small cameo for #DiscoResearch ☺️)
@latentspacepod
Latent.Space
10 months
🆕 The Busy Person's Intro to Finetuning & Open Source AI with @winglian of Axolotl Covering the SF AI meetup with @NousResearch , @Teknium1 , and all the required knowledge to get started navigating Open Source AI and finetuning them. (special cohost
2
24
107
0
1
13
@jphme
Jan P. Harries
7 months
Important paper by @cervisiarius et al., showing that the reasoning process of (current generation, mainly trained on English) LLMs is biased towards English and non-english token probabilities only increase over the last layers. Also tokenization matters! This could explain
Tweet media one
@cervisiarius
Bob West
7 months
In our new preprint, we ask: Do multilingual LLMs trained mostly on English use English as an “internal language”? - A key question for understanding how LLMs function. “Do Llamas Work in English? On the Latent Language of Multilingual Transformers”
Tweet media one
13
113
615
0
2
12
@jphme
Jan P. Harries
1 year
@Tendar Following you bc your tweets are generally well-informed,but this is really a mean-spirited defamation.There were basically none German politicians that tweeted that early, often and unambiguously standing by Israel and promising support as @W_Schmidt_ . Unnecessary partisanship!
1
1
8
@jphme
Jan P. Harries
8 months
@jon_durbin Yes, I think I told you once that this alone would be a reason to Switch to axolotl 😉. Jokes aside, I can attest that it doesn't hurt performance; in the opposite, in our experience, it even helps with regularization and prevents overfitting. You have to adjust LR though.
1
0
8
@jphme
Jan P. Harries
9 months
@Teknium1 @JagersbergKnut DiscoLM German (release coming soon) will have function calling too, trained on both JSON schema and the OpenAI format (as described by @HamelHusain ) and with some inspiration from @abacaj s notebook using Open Hermes. Could also do an en-only version, inf code coming too :)
0
0
6
@jphme
Jan P. Harries
8 months
@mkinyugo @jeremyphoward Actually 300w is even better; very low drop from 350w (and even less difference for training compared to inference). Did some ablations as well with 290w/275w and so, but below 300w performance drop off gets large
2
0
8
@jphme
Jan P. Harries
10 months
@far__el @bjoern_pl @Tim_Dettmers Some have tested but I haven't seen anyone with "usable" results from this yet; I think tim is on his way to Neurips so currently can't help to debug. But first FT run has finished and should hopefully be ready for release soon after ironing out a few remaining bugs...
1
0
8
@jphme
Jan P. Harries
1 year
@Yampeleg Did you try them out? Their eval scores look unrealistic high compared to llama when not "trained on the Testset" and their previous model (which also beats llama2 by a mile in all benchmarks on paper) got almost no traction outside China. Call me skeptical...
1
0
8
@jphme
Jan P. Harries
8 months
Last week, I was invited to present our startup @ellamindAI and the advantages of local LLMs for businesses at "AI in Action" at the @_KINRW AI village. Also demoed the new DiscoLM German 7b by @DiscoResearchAI . Some learnings 👇 1/2
Tweet media one
2
1
8
@jphme
Jan P. Harries
10 months
Obvious Disclaimer: All of this is still very beta + experimental (and inference very inefficient/slow), use at own discretion. Surely the "official" implementation will drop soon. Happy about feedback + ideas, come and chat with us in our Disco(rd): 4/4
0
1
7
@jphme
Jan P. Harries
1 year
@Duesseldorf @BFDuesseldorf Nina-Meldung ging gerade raus, sofort waren beide in der Meldung genannten Websites ( und Feuerwehr-duesseldorf) nicht mehr erreichbar. Das sollte in Zukunft besser sein, was machen Bürger ohne Twitter?
1
0
6
@jphme
Jan P. Harries
7 months
If #Gemini 1.5 Pro is indeed comparable to 1.0 Pro (and not Ultra) in size/inference costs, the progress vs 1.0 is remarkable and 1 Mio. ctx are impressive (if usable in practice)! 👏 Google catching up?
Tweet media one
@drjwrae
Jack Rae
7 months
We are announcing the Gemini 1.5 series of models today! * Support for 1M context lengths (tested up to 10M) * Gemini 1.5 Pro nears Gemini 1.0 Ultra performance with greater efficiency * Cloud users can sign up to waitlist for preview
5
13
184
1
0
6
@jphme
Jan P. Harries
8 months
👀
Tweet media one
0
0
7
@jphme
Jan P. Harries
3 years
@sama "Out of the fixed supply of 10 billion tokens,twenty percent are used to fund Orb production and initial protocol development."While the idea sounds interesting,seems like just another get-quick-rich scheme. If you want it to get big, distribute fairly and fund e.g. through orbs
0
0
5
@jphme
Jan P. Harries
1 year
@Tim_Dettmers @Teknium1 @mcy_219085 @alyssamvance @HamelHusain Sometimes problems that seem trivial for you can be extremely hard to solve and frustrating for others. Hopefully you get sufficient "maintenance" support for bitsandbytes ( @huggingface ?) to focus on the really hard problems. Keep up your incredible work, many many thanks 🙏
0
0
5
@jphme
Jan P. Harries
10 months
One of the best roundups yet on the Mixtral release - great stuff as usual from @natolambert ! (even though "buried in one of the many ML discords" was a bit harsh - youre cordially invited to our @DiscoResearchAI Discord 😉)
@natolambert
Nathan Lambert
10 months
I studied mixture of experts this weekend. Here's my roundup on @MistralAI 's Mixtral: * Why the model is obviously solid * Mixture of expert basics * Predicting llama 3 and mistral medium / large * Release strategies * AI Alliance, EU AI Act, Gemini, vibes... * Other topics
2
12
70
2
0
6
@jphme
Jan P. Harries
10 months
@erhartford @bjoern_pl @migtissera And I forgot to mention him on Twitter - sorry! 😶 We love @migtissera `s Synthia - actually we even have something else with Synthia-DNA in the works (basically a new SoTA translation/filtering/evaluation pipeline for distilling the best samples and use them for multilingual
1
0
6
@jphme
Jan P. Harries
9 months
@adithyan_ai Interesting, but 0.53$/ kwh in Germany? I pay 0.27$ kwh (as a normal household without any special discount) and if you can use PV, it´s easy to get <0.10$ in Germany. And comparing by PPP doesn´t really make sense if you can just rent GPUs elesewhere.
1
0
6
@jphme
Jan P. Harries
11 months
Same here. The output quality of #ChatGPT (and what you can use in GPTs) is getting worse with each release. I tried a very simple usecase for a personal GPT (without retrieval) and it mostly fails miserably for semi-complex tasks; only after a lot of prompt engineering, it
Tweet media one
Tweet media two
@max_paperclips
Shannon Sands
11 months
So, I tried a test with the new GPTs, to see if the whole "it can generate a website from a single request" thing is accurate. I'd done a similar test a while back - define a VERY simple website using Go & HTML, HTMX with AlpineJS so it doesn't even need a React app or similar,
22
25
255
2
0
5
@jphme
Jan P. Harries
10 months
@migtissera @bjoern_pl Thanks for the awesome synthia dataset! Sorry, forgot to thank you on Twitter, but you´re at least on the model card 🙈. Our finetune was done with Axolotl using QLora, so "should" be supported out-of-the-box (but ymmv and it´s still very slow)..
1
0
6
@jphme
Jan P. Harries
10 months
@erhartford @Teknium1 @_philschmid @alignment_lab Actively experimenting with this (so far only for translated datasets, but will probably also apply it to original/English ones), using specialized models only trained for eval/filter, 3+cats. Verifying is easier than creating. Bonus: Can get DPO data with the same approach.
0
0
6
@jphme
Jan P. Harries
1 year
Very interesting and detailed leak of the GPT-4 architecture, with excellent analysis of the reasoning behind it and it's implications - by @dylan522p : A non-paywalled summary can be found here:
0
2
3
@jphme
Jan P. Harries
11 months
@ocolegro Sad but unavoidable that the first thing some smart people are using this incredible technology for is generating spam (which is essentially making money by exploiting the part of the population that isn't tech-savvy enough to find what they're really looking for)
1
0
5
@jphme
Jan P. Harries
2 years
@maxocito @fragdenstaat @HumboldtUni @senfcall Ich habe damals alle Alternativen (inkl. BBB, Webex, dfnconf) für unsere Fakultät intensiv getestet - es gab kein Tool außer Zoom,das einfach bedienbar war (für 18-jährige Studis und 60+ Profs) und sowohl für Seminare als auch Vorlesungen mit 500+ TN stabil+zuverlässig lief.
2
0
3
@jphme
Jan P. Harries
11 months
@abacaj Tbh I think it was pretty clear that the market for everything OAI has announced today will be commoditized. Non-generic/wrapper startups will be fine; as soon as you leave MVP/"langchain"-stage, you´ll need/want custom solutions (and assistants is expensive -was billed 0.5$/req)
0
0
5
@jphme
Jan P. Harries
10 months
Great stuff - nobody can wait until @MistralAI drops the official implementation 😉; I love the pace of the OSS LLM community. @bjoern_pl now also got a HF-compatible version working, check it out here: #DiscoResearch
@dzhulgakov
Dmytro Dzhulgakov
10 months
Run @MistralAI MoE 8x7B locally with hacked-up official Llama repo! First cut, not optimized, requires 2 80GB cards. The optimized version is coming soon to @thefireworksai . Stay tuned!
Tweet media one
14
60
467
0
1
5
@jphme
Jan P. Harries
10 months
@HendrikWuest Ja, sie "schützt" unsere Kinder und Enkel vor dringend notwendigen Investitionen; egal ob es um Bildung, Infrastruktur, Klima, Digitalisierung oder Wettbewerbsfähigkeit geht. Die #Schuldenbremse kommt bei Wählern 60+ gut an, das ist der wahre Grund für Ihre Haltung, seien Sie
1
1
5
@jphme
Jan P. Harries
11 months
If everything works, this will be huge for coding/prototyping. Embed you Codebase and docs and let code interpreter build the MVP (already possible with 3rd party tools, but lots of integration effort and friction..)
@ShreyaR
shreya rajpal
11 months
@satyanadella @zapier Assistants API highlights ✅ better function calling ✅ code interpreter, Python sandbox ✅ built in retrieval ✅ memory
1
2
15
0
1
5
@jphme
Jan P. Harries
10 months
Yep! Domain-specific models and pipelines will be huge and will outperform larger+"smarter" general models in most cases. (This may be more apparent yet for non-English applications, where the GPT-4 "baseline" is lower and easier to surpass..)
@IntuitMachine
Carlos E. Perez
10 months
1/n It should be pretty obvious now that a 7-14B model can best GPT-4 in specialized domains. This realization torpedoes GPU-rich firms from establishing a monopoly. One can leverage extreme asymmetric information arbitrage in the long-tail of LLM applications.
22
45
385
1
1
5
@jphme
Jan P. Harries
9 months
@Teknium1 Same. And don't try to use the HF example code with ChatML and system prompts (see ). The issues in the alignment handbook repo are funny as well, nobody can replicate anything 🤷‍♂️. Can't wait for it to work with Axolotl 🙂
0
0
5
@jphme
Jan P. Harries
1 year
@emollick IMHO he tooling is not there yet but evolving rapidly. Look e.g. at approaches like described here by @jxnlco
0
1
5
@jphme
Jan P. Harries
1 year
@jeremyphoward @Tim_Dettmers @Euclaise_ @Teknium1 I would also be v interested in this (and volunteer running experimental code)
0
0
1
@jphme
Jan P. Harries
10 months
@laion_ai @MistralAI @bjoern_pl @ylecun Yes, offloading is hard without crushing performance; this would likely need some quantization trickery to run even larger models (and if gpt-4 rumours are true, there is probably an upper bound for the best number of experts <10 for most use cases..) :)
0
0
5
@jphme
Jan P. Harries
7 months
Wild predictions from Daniel Kokotajlo, the OpenAI researcher who knew mid 2021 how 2023/24 will look like 🤔. Source 👇
Tweet media one
Tweet media two
Tweet media three
1
0
3
@jphme
Jan P. Harries
11 months
Great work by @WolframRvnwlf comparing top open LLMs on a realistic and custom benchmark + happy to see EM German Leo Mistral in the top 3 for the 7b models, beating GPT 3.5! (and also congratz to @Teknium1 and @jon_durbin , well deserved top spots)
Tweet media one
@WolframRvnwlf
Wolfram Ravenwolf 🐺🐦‍⬛
11 months
Worked hard for over a week on this Huge LLM Comparison/Test: 39 models tested (7B-70B + ChatGPT/GPT-4)
10
55
403
0
0
4
@jphme
Jan P. Harries
8 months
1. Many SMEs still struggle to understand the evolving capabilities of LLMs and how they need to adapt. 2. Often, small pilot projects already deliver enormous business value, mitigate fears among employees and pave the way for broad AI adaption throughout the entire
0
0
5
@jphme
Jan P. Harries
4 years
@sozmiSH @Land_SH Wann kommen denn die Nachholtermine der abgesagten AZ Impfungen?
1
0
3
@jphme
Jan P. Harries
1 year
Released a small Wizard Vicuna uncensored finetune of the 1.5bn #Phi 1.5 model - very experimental, but shows what's possible with super small models, excited for more (thanks @erhartford for the dataset, @Teknium1 for guidance and @winglian for Axolotl!)
0
0
4
@jphme
Jan P. Harries
9 months
0
0
4
@jphme
Jan P. Harries
8 months
@discoresearch @dominik_scherm @basti_vkl @lafamigliaVC @cdtm_munich Sorry for mistyping your handle @vietdle (spotty Internet and no autocorrection on the train 🙄)! See his coverage here: 👏
@vietdle
viet
8 months
🇩🇪/acc in Munich - one of the OG machine learning cities where LSTMs and Stable Diffusion were born - was on fire yesterday. more than 100 builders showed up at our @AITinkerers event yesterday, with 12 folks showcasing projects ranging from no-code pipelines to fully finetuned
12
17
163
0
0
2
@jphme
Jan P. Harries
3 years
This analysis I retweeted in January is still the best piece about the whole situation. Rob got right what many politicians and journalists got wrong...
@RALee85
Rob Lee
3 years
My argument regarding Russia's behavior: 1) Moscow switched from deterrence to compellence 2) The key issue is Moscow believes Kyiv will remain hostile and is increasing its defensive capabilities 3) the costs of inaction are greater than an escalation
979
3K
10K
0
0
4
@jphme
Jan P. Harries
10 months
@Teknium1 You should be able to run it quantized without much loss on 2*4090 with the quality of a 70b and speed of a 7b model, no?
2
0
4
@jphme
Jan P. Harries
8 months
@jon_durbin Yes, I'd say increase roughly proportionally to the decrease in number of steps, ymmv depending on the dataset
0
0
4
@jphme
Jan P. Harries
10 months
@migtissera @erhartford @bjoern_pl Not natively, you have to enable trust remote code. You can use this and add trust_remote_code and is_mistral_derived model, should even work with sample packing. We did Qlora but ymmv...
0
0
3
@jphme
Jan P. Harries
3 years
@RNAiAnalyst Additionally, the dose for their mouse model seems to be 8mmg (2/3 of human dose?) whereas others (e.g. Biontech) used 1/30 to max 1/6 human dose in their mouse models - or did I understand that wrong?
1
0
3
@jphme
Jan P. Harries
11 months
@felix_red_panda @JagersbergKnut Yep, translations don't work well in my experience. That´s why EM German only uses a minimal amount of translated data (see for some example outputs). If you are interested in this topic: I would appreciate any kind of contributions/collab for more
0
0
3
@jphme
Jan P. Harries
9 months
@abacaj 130t/s with Mixtral quantized in good quality ob 1x 4090? How? 😲 Or did you mean 130 t/s for Mistral 7b?
1
0
3
@jphme
Jan P. Harries
10 months
HBO's Silicon Valley was dull vs this
@sama
Sam Altman
10 months
if i start going off, the openai board should go after me for the full value of my shares
5K
5K
66K
0
0
3
@jphme
Jan P. Harries
9 months
@winglian @erhartford @pyautogen @Teknium1 @theemozilla I would also be vey interested in standardization (currently preparing a function calling dataset). For System prompt format there is the minimal OpenAI format (cc @HamelHusain ) vs the longer JSON schema - see examples. Also this NB by @abacaj
Tweet media one
Tweet media two
2
0
2
@jphme
Jan P. Harries
8 months
@moyix Actually you should be able to use the Fastchat template with any OpenAI compatible endpoint without using their inference - see (But fastchat also supports a vllm worker, just in case you missed that)
0
0
3
@jphme
Jan P. Harries
1 year
Released a little, lightweight python library for high-throughput use of the OpenAI API with full integration of function-calling. Based on the @OpenAI Cookbook and @jxnlco `s openai_function_call: #openai #python 1/4
Tweet media one
1
1
3
@jphme
Jan P. Harries
8 months
Demo, Download + more information on DiscoLM German here:
@DiscoResearchAI
DiscoResearch
8 months
Download: Demo: The model has been trained in 3 phases (LeoLM ctd pretraining -> SFT -> DPO) and offers a specialized RAG template (by @rasdani_ ) and (experimental) function calling ability. 2/3
1
0
3
0
0
1
@jphme
Jan P. Harries
10 months
@NateGenX @xlr8harder You can also try their "Luminous" models on their website. The 70b "Supreme" model is worse than every half-baked Llama2/Mistral 7b finetune you can dl on HF (only tried some prompts and didn't do any benchmarks). Why keep it secret, if their internal models are another level?
1
0
3
@jphme
Jan P. Harries
10 years
@pat_hennings Gibt es bald ein @neue_liberale Treffen in Ddorf? Wäre interessiert ggf. teilzunehmen, freue mich über infos - grüße,jp
0
0
3
@jphme
Jan P. Harries
8 months
@BlancheMinerva @Dorialexander I would be very interested in the lit on this. In our experience with German models (both ctd pretraining and SFT), non-english pretraining and finetuning BOTH lead mostly to reduced scores in same-language benchmarks for smaller models. Interestingly, this is partially reversed
0
0
2
@jphme
Jan P. Harries
10 months
@xlr8harder They get showered with money (500m fundraise last week, all from investors who don't have any AI expertise) and attention from German press, politicians and large corps (they are THE German AI champion in the German public) but didn't release anything notable during the most
2
0
3
@jphme
Jan P. Harries
7 months
just wow... 🤯
@OfficialLoganK
Logan Kilpatrick
7 months
Say hello to Sora, our next generation text to video model created by @OpenAI 🤯
86
179
2K
0
0
3
@jphme
Jan P. Harries
4 years
@vodafone_de Internet weg (ehemals Unitymedia/Düsseldorf) - was ist da los und wann geht es wieder? Schon knapp eine Stunde und ich muss arbeiten...
2
0
1
@jphme
Jan P. Harries
8 months
@moyix Huggingface Chat UI, Chatbot UI or if you want it simpler just a basic Gradio/stream lit app
1
0
1
@jphme
Jan P. Harries
8 months
#D üsseldorf ❤️
Tweet media one
0
0
3
@jphme
Jan P. Harries
10 months
@jon_durbin @lhl @akavirtual_ @a16z Congratz on the release! Exciting results and important progress for non-english open models; awesome that you also release the pipeline and thought process behind design decisions :)
0
0
2
@jphme
Jan P. Harries
7 months
@dvilasuero @natolambert Actually they did an ablation in the paper showing that correlation/accuracy drops quite a lot without a reference answer (but still works). We trained with some examples without reference answers (and some examples scaled down to 3 cats) but results were worse. cc @seungonekim
Tweet media one
1
0
4
@jphme
Jan P. Harries
8 months
Agree. Even more true in Europe and this doesn't only devalue working hard/building, but also leads to a new kind of feudalism if left unchecked. Housing situation will get really ugly soon and neither politicians nor central banks have any idea what to do about it.
@LynAldenContact
Lyn Alden
8 months
Basically, the modern financial system with a 40-year rate decline is one that has rewarded arbitrageurs relative to builders a bit more than most would generally consider desirable, and that has led to a lot of the polarization we see today both domestically and geopolitically.
28
46
534
1
0
1
@jphme
Jan P. Harries
1 year
@Teknium1 Axolotl as well. Didn't do thorough (human) benchmarking yet, but especially with mixed-size Datasets (where some examples are very short and long answers dominate the packed context window) the difference seems to be significant...
2
0
2
@jphme
Jan P. Harries
2 years
@maxocito @fragdenstaat @HumboldtUni @senfcall Nichtsdestotrotz bin ich natürlich absolut für die Offenlegung entsprechender Verträge und Zahlungen und finde es gut,dass ihr das rechtlich durchsetzt! Nur die Schlussfolgerung bzw. Kritik trage ich nicht mit (wobei ich nicht weiß, ob die Lage sich inzwischen geändert hat)
1
0
1
@jphme
Jan P. Harries
11 months
@abacaj I wonder why they move so far in the applications and don't concentrate on holding their edge in providing the best models (data? But can't train on most of it...) Tough for wrapper startups, but (except the price cut and gpt4-turbo it it performs) nothing that excites me..
2
1
2
@jphme
Jan P. Harries
2 years
@CarloMasala1 Precautionary principle.Eine gewisse Angst ist nicht irrational, wenn a) prediction markets (e.g. ) und kluge Beobachter (e.g. ) die Möglichkeit als unwahrscheinlich,aber realistisch erachten und b) die Folgen nicht absehbar sind
@kamilkazani
Kamil Galeev
2 years
I think Kremlin may view nuclear strike on Ukraine (with an American retaliatory strike) as a rational move. It may not make much sense in the context of foreign policy, but it does in the context of domestic policy. Meanwhile foreign policy is just domestic policy by other means
316
812
4K
0
0
1
@jphme
Jan P. Harries
10 months
But seriously, this in no good news. Say what you want about @sama , but I can't imagine that he did anything unethical (and he has zero ownership in #openai ) . Maybe he wanted to open source something or did feel it was dangerous to accelerate too fast? "Mr. Altman’s departure
@jphme
Jan P. Harries
10 months
GPT-5 took over 🤯
0
0
0
0
1
2
@jphme
Jan P. Harries
1 year
@Teknium1 This script didn't work for me as well, but due to some torch error. But models being identical sounds more like a code error (I already got bitten by this - it's very hard to "unload" adapters again with transformers, when I need to be safe I reload the whole model)?
1
0
2
@jphme
Jan P. Harries
1 year
Interesting insights about OpenAI's future here. Most important: Major GPU bottleneck prevents scaling/fast progress (API users know, GPT-4 responses take and eternity *sigh*), bigger context windows coming (yay), scaling laws are still holding -> expect superhuman models soon..
@brickroad7
renji the synthetic data maximalist
1 year
🚨🚨🚨🚨🚨🚨🚨🚨 deleted sam altman interview... lots of alpha here... wayback machine link -->
Tweet media one
Tweet media two
44
251
2K
0
0
2
@jphme
Jan P. Harries
1 year
@jxnlco Maybe I missed some of the intentions, but I found you can radically streamline and shorten the query planning example and it works fine (at least with GPT-4) without the 2-step procedure: (Field descriptions shortened for readability below)
Tweet media one
2
0
2
@jphme
Jan P. Harries
10 months
Its happening 🤩 (8*7b MoE model by @MistralAI incoming...)
@MistralAI
Mistral AI
10 months
magnet:?xt=urn:btih:5546272da9065eddeb6fcd7ffddeef5b75be79a7&dn=mixtral-8x7b-32kseqlen&tr=udp%3A%2F%%3A6969%2Fannounce&tr=http%3A%2F%%3A80%2Fannounce RELEASE a6bbd9affe0c2725c1b7410d66833e24
550
2K
10K
0
0
2
@jphme
Jan P. Harries
7 years
0
0
2
@jphme
Jan P. Harries
10 months
Woah - 2 large Chinese models claiming to beat Llama2-70b out on the same day: QWEN 72b and DeepSeek 67b. Can't wait for Llama 3 and Mistral 70b, OpenLLMs are thriving 🚀
Tweet media one
Tweet media two
0
0
2
@jphme
Jan P. Harries
3 years
@RNAiAnalyst "Removing cases included in the 18-to-60 analysis from the overall efficacy data set suggests there were more cases in vaccinated over-60s than in unvaccinated over-60s." (Source: Fierce: )
0
0
2
@jphme
Jan P. Harries
1 year
@jsuedekum Regarding societal/economic risks as mentioned in the interview:I don't think comparisons with robotics or computerization (or basically any innovation in the past) are valid.All previous innovations were enabled by human thoughts and actions.That'll be different.Huge uncertainty
1
0
2
@jphme
Jan P. Harries
2 years
Difficult and complex topic and I don't necessarily agree on every point-but @erikphoel does a great job highlighting what cold happen if Rus further escalates due to exhausted alt. and conventional defeat. Could quickly lead to existential threats if not managed v carefully.
@erikphoel
Erik Hoel
2 years
1. Escalation with Russia over the war in Ukraine continues to ramp up, and everyone has their fingers on the like button. A 🧵
1
9
31
0
1
2