RichardACarter2 Profile Banner
Richard A Carter Profile
Richard A Carter

@RichardACarter2

Followers
2K
Following
8K
Media
2K
Statuses
7K

Academic in Digital Culture, University of York, UK. Glider Pilot. Also available on other well known platforms.

United Kingdom
Joined July 2012
Don't wanna be here? Send us removal request.
@RichardACarter2
Richard A Carter
3 years
It is a privilege to share that my collection "Signals" has been published by @GuillemotPress and is available for purchase. Signals is a speculative attempt at writing poetry with Lincos, a mathematical language designed for messaging extraterrestrials
Tweet card summary image
guillemotpress.co.uk
Signals is Richard Carter ’s speculative attempt at generating poetry using the mathematical language of Lincos , a system designed by Hans Freudenthal in 1960 as a method of communicating with...
3
20
52
@RichardACarter2
Richard A Carter
4 days
Grid world - amusing outcome of one set of tests of gpt-oss:20 vs 120!
Tweet media one
Tweet media two
0
0
0
@grok
Grok
2 days
Generate videos in just a few seconds. Try Grok Imagine, free for a limited time.
688
3K
9K
@RichardACarter2
Richard A Carter
5 days
View from the office window this afternoon (courtesy of the new live video version of the Nephoscope program)
0
0
3
@RichardACarter2
Richard A Carter
6 days
. And yes, I had used the new Ollama flag to turn off reasoning - although some early reports suggest that it's not actually working for users. Might need to redo the template, but all other models do work, and gpt-oss "does" start out okay. 🤔.
0
0
0
@RichardACarter2
Richard A Carter
6 days
I had hoped to show you the outcome of this particular exchange, but in true fashion, OpenAI's gpt-oss had stalled 'reasoning' about its next early move for about 7 minutes, making the 4090 run very hot, before clunking to a halt. 3x in a row. Can't justify trying again!
Tweet media one
1
0
0
@RichardACarter2
Richard A Carter
7 days
20/ Fini - I hope you enjoyed this little experiment, which essentially shows what we all knew already: LLMs are not "AI", they are. LLMs, with all the nuances, intrigues, and limitations of what this particular technology entails.
0
0
0
@RichardACarter2
Richard A Carter
7 days
19/ Now that gpt oss has been released, of course, I'm going to have to give it a go tonight, and will report back. .
1
0
0
@RichardACarter2
Richard A Carter
7 days
18/ Again, I have not read in much depth the efforts by others to get LLMs to play chess, or the levels of success / failure they have encountered (I 'sense' it's more the latter). Hence, these conclusions and questions may be entirely naïve. .
1
0
0
@RichardACarter2
Richard A Carter
7 days
17/ Of course, this experiment also invites proper tests with actual chess algorithms - I suspect the latter would triumph without issue. Conversely, I also wonder if I ran an RNG chess opponent, rather than an LLM, whether the outcomes would correlate?.
1
0
0
@RichardACarter2
Richard A Carter
7 days
16/ Such a conclusion is no surprise. I wonder if LLMs trained on game histories exclusively would fair any better? Maybe against opposing models, but I doubt they would even begin to trouble a basically competent player, let alone a professional.
1
0
0
@RichardACarter2
Richard A Carter
7 days
15/ Where does this all leave us? You would need to run many games with expert chess input to really get a proper statistical picture of game strength and quality. BUT: given that these models are entirely propped up by autocorrect throughout, I forward they are poor, overall.
1
0
0
@RichardACarter2
Richard A Carter
7 days
14/ How about big vs small models? Just because? Qwen did win against Gemma:27b as both black and white, but lost once as black. Llama vs Deepseek R1 was "interminable", long enough to result in multiple queenings! Near arbitrary play at the very end. R1 eventually won.
Tweet media one
1
0
0
@RichardACarter2
Richard A Carter
7 days
13/ Qwen vs Gemma: Strong first mover advantage seemed evidence in these exchanges. Qwen won 3 games as white, while Gemma won 2 games as white, and drew an extra game.
1
0
0
@RichardACarter2
Richard A Carter
7 days
12/ Llama vs Qwen3:8b. Llama won 2 games as white, Qwen 3 games as white. Extra game, Qwen won as black. Qwen maybe stronger, but first mover advantage seems notable.
1
0
0
@RichardACarter2
Richard A Carter
7 days
11/ Now for the small models - fast and furious. Llama3.2:3b vs Gemma3:4b. Llama won 3 games as white, Gemma 3 games as white. Extra game resulted in a draw, but Llama was not in a strong position. Gemma felt like it was doing better here.
Tweet media one
1
0
0
@RichardACarter2
Richard A Carter
7 days
10/ Finally, Deepseek-R1 vs Gemma3:27b. The slowest games on my hardware, and the most expressly structured. 1 draw and 2 victories for R1 as white. Too slow to want to run many tests, but likely R1 is a notch stronger.
Tweet media one
1
0
0
@RichardACarter2
Richard A Carter
7 days
9/ Next, Phi vs Deepseek-R1:32b. Phi won 2 games as white, 2 games as black, while R1 won 1 game as white. Phi stronger? Ran a (particularly long and gruelling) bonus game, with Phi as black, and it lost. Another bonus game, with Phi as white, and it also lost. A draw?
Tweet media one
1
0
0
@RichardACarter2
Richard A Carter
7 days
8/ In my completely unscientific process, I next let Phi battle it out with Gemma3:27b. Harder contest: Phi won 2 games as white and 2 games as black, Gemma won 2 games as white. Phi stronger on this count, but it was relatively even.
Tweet media one
1
0
0
@RichardACarter2
Richard A Carter
7 days
7/ Next, the contest proper begins, getting the LLMs to battle it out directly with opposing models. Started out with Gemma3:12b vs Phi4:14b. Phi won 1 game as white, 2 games as black, while Gemma only managed to draw once as white.
Tweet media one
1
0
0
@RichardACarter2
Richard A Carter
7 days
6/ Anyway. Initial tests were getting the usual range of 'compact' LLMs - Google's Gemma3, Llama 3.2, Phi4, Qwen3 - to play against themselves. Rules were to run the game until initial checkmate or draw state. These were just tests, but Qwen was especially "fast" at this.
Tweet media one
1
0
0
@RichardACarter2
Richard A Carter
7 days
5/ The issue, of course, is that after 5-6 moves, practically every LLM input had to be 'corrected' in this way, and so there is an key question as to the extent whether the LLM or the spellchecker is actually guiding the game at a given point - the LLMs certainly couldn't!.
1
0
0