Richard A Carter @RichardACarter2 X Profile

Richard A Carter

@RichardACarter2

Followers

2K

Following

8K

Media

2K

Statuses

7K

Academic in Digital Culture, University of York, UK. Glider Pilot. Also available on other well known platforms.

United Kingdom

Joined July 2012

Don't wanna be here? Send us removal request.

Richard A Carter

@RichardACarter2

3 years

It is a privilege to share that my collection "Signals" has been published by @GuillemotPress and is available for purchase. Signals is a speculative attempt at writing poetry with Lincos, a mathematical language designed for messaging extraterrestrials

guillemotpress.co.uk

Signals is Richard Carter ’s speculative attempt at generating poetry using the mathematical language of Lincos , a system designed by Hans Freudenthal in 1960 as a method of communicating with...

3

20

52

Richard A Carter

@RichardACarter2

4 days

Grid world - amusing outcome of one set of tests of gpt-oss:20 vs 120!

0

Grok

@grok

2 days

Generate videos in just a few seconds. Try Grok Imagine, free for a limited time.

688

3K

9K

Richard A Carter

@RichardACarter2

5 days

View from the office window this afternoon (courtesy of the new live video version of the Nephoscope program)

0

3

Richard A Carter

@RichardACarter2

6 days

. And yes, I had used the new Ollama flag to turn off reasoning - although some early reports suggest that it's not actually working for users. Might need to redo the template, but all other models do work, and gpt-oss "does" start out okay. 🤔.

0

Richard A Carter

@RichardACarter2

6 days

I had hoped to show you the outcome of this particular exchange, but in true fashion, OpenAI's gpt-oss had stalled 'reasoning' about its next early move for about 7 minutes, making the 4090 run very hot, before clunking to a halt. 3x in a row. Can't justify trying again!

1

0

Richard A Carter

@RichardACarter2

7 days

20/ Fini - I hope you enjoyed this little experiment, which essentially shows what we all knew already: LLMs are not "AI", they are. LLMs, with all the nuances, intrigues, and limitations of what this particular technology entails.

0

Richard A Carter

@RichardACarter2

7 days

19/ Now that gpt oss has been released, of course, I'm going to have to give it a go tonight, and will report back. .

1

0

Richard A Carter

@RichardACarter2

7 days

18/ Again, I have not read in much depth the efforts by others to get LLMs to play chess, or the levels of success / failure they have encountered (I 'sense' it's more the latter). Hence, these conclusions and questions may be entirely naïve. .

1

0

Richard A Carter

@RichardACarter2

7 days

17/ Of course, this experiment also invites proper tests with actual chess algorithms - I suspect the latter would triumph without issue. Conversely, I also wonder if I ran an RNG chess opponent, rather than an LLM, whether the outcomes would correlate?.

1

0

Richard A Carter

@RichardACarter2

7 days

16/ Such a conclusion is no surprise. I wonder if LLMs trained on game histories exclusively would fair any better? Maybe against opposing models, but I doubt they would even begin to trouble a basically competent player, let alone a professional.

1

0

Richard A Carter

@RichardACarter2

7 days

15/ Where does this all leave us? You would need to run many games with expert chess input to really get a proper statistical picture of game strength and quality. BUT: given that these models are entirely propped up by autocorrect throughout, I forward they are poor, overall.

1

0

Richard A Carter

@RichardACarter2

7 days

14/ How about big vs small models? Just because? Qwen did win against Gemma:27b as both black and white, but lost once as black. Llama vs Deepseek R1 was "interminable", long enough to result in multiple queenings! Near arbitrary play at the very end. R1 eventually won.

1

0

Richard A Carter

@RichardACarter2

7 days

13/ Qwen vs Gemma: Strong first mover advantage seemed evidence in these exchanges. Qwen won 3 games as white, while Gemma won 2 games as white, and drew an extra game.

1

0

Richard A Carter

@RichardACarter2

7 days

12/ Llama vs Qwen3:8b. Llama won 2 games as white, Qwen 3 games as white. Extra game, Qwen won as black. Qwen maybe stronger, but first mover advantage seems notable.

1

0

Richard A Carter

@RichardACarter2

7 days

11/ Now for the small models - fast and furious. Llama3.2:3b vs Gemma3:4b. Llama won 3 games as white, Gemma 3 games as white. Extra game resulted in a draw, but Llama was not in a strong position. Gemma felt like it was doing better here.

1

0

Richard A Carter

@RichardACarter2

7 days

10/ Finally, Deepseek-R1 vs Gemma3:27b. The slowest games on my hardware, and the most expressly structured. 1 draw and 2 victories for R1 as white. Too slow to want to run many tests, but likely R1 is a notch stronger.

1

0

Richard A Carter

@RichardACarter2

7 days

9/ Next, Phi vs Deepseek-R1:32b. Phi won 2 games as white, 2 games as black, while R1 won 1 game as white. Phi stronger? Ran a (particularly long and gruelling) bonus game, with Phi as black, and it lost. Another bonus game, with Phi as white, and it also lost. A draw?

1

0

Richard A Carter

@RichardACarter2

7 days

8/ In my completely unscientific process, I next let Phi battle it out with Gemma3:27b. Harder contest: Phi won 2 games as white and 2 games as black, Gemma won 2 games as white. Phi stronger on this count, but it was relatively even.

1

0

Richard A Carter

@RichardACarter2

7 days

7/ Next, the contest proper begins, getting the LLMs to battle it out directly with opposing models. Started out with Gemma3:12b vs Phi4:14b. Phi won 1 game as white, 2 games as black, while Gemma only managed to draw once as white.

1

0

Richard A Carter

@RichardACarter2

7 days

6/ Anyway. Initial tests were getting the usual range of 'compact' LLMs - Google's Gemma3, Llama 3.2, Phi4, Qwen3 - to play against themselves. Rules were to run the game until initial checkmate or draw state. These were just tests, but Qwen was especially "fast" at this.

1

0

Richard A Carter

@RichardACarter2

7 days

5/ The issue, of course, is that after 5-6 moves, practically every LLM input had to be 'corrected' in this way, and so there is an key question as to the extent whether the LLM or the spellchecker is actually guiding the game at a given point - the LLMs certainly couldn't!.

1

0