Mike K Tung @mikektung X Profile

Mike K Tung

@mikektung

Followers

895

Following

335

Media

20

Statuses

272

CEO at Diffbot, world's largest knowledge graph. Mostly here to read papers.

Stanford, CA

Joined July 2010

Don't wanna be here? Send us removal request.

Mike K Tung

@mikektung

3 months

One could say the weights themselves are maximally truth-seeking.

0

1

Mike K Tung

@mikektung

3 months

You can try it out here!

0

Mike K Tung

@mikektung

3 months

70B model: 8B model:

1

0

Mike K Tung

@mikektung

3 months

We've released both 8B and 70B variants that you can run on your own H100s, or just use our cloud version. Our v1 had over 90k+ downloads as well as 3rd party quants that let you run it on consumer GPUs, so it'll be interesting to see what people build with this one!.

1

0

Mike K Tung

@mikektung

3 months

You can also flip the reasoning on or off by just stating "Think tokens: on" in the system message, allowing you to combine it into other workflows.

1

0

Mike K Tung

@mikektung

3 months

The tricky part in training this LLM was baking this "function calling intelligence" into the weights itself so that it does not need explicit instructions at inference time for how to select and craft queries, when to retry functions, how to adjust search strategies, etc.

1

0

Mike K Tung

@mikektung

3 months

It's just the raw weights of the LLM elegantly maximizing in its next token prediction whether to think, what functions to call to get more information, or write, until it is done.

2

0

Mike K Tung

@mikektung

3 months

Unlike other agentic "deep research" implementations, there is no "orchestration" code (fancy word for prompts + if-else statements).

1

0

1

Mike K Tung

@mikektung

3 months

What this results in is a much more compact thinking trace and is much more adaptive than "agentic" systems that build a fixed plan to follow through on, which often take several minutes to hours to run. Its also backed by the world's largest knowledge graph and our own webindex.

1

0

1

Mike K Tung

@mikektung

3 months

Interleaved function calling is when it doesn't just do a fixed <think>, then respond (like the first reasoning models O1/DeepSeek R1 did), but can alternate between thinking, calling, and writing as it generates the output.

1

0

1

Mike K Tung

@mikektung

3 months

It's out! Just git push'ed to Hugging Face the open-source weights of v2 of the Diffbot GraphRAG LLM. This is the first open-source LLM that has been trained to do interleaved function call reasoning, like OpenAI's o3, except it's runnable on your own hardware.

1

0

1

Mike K Tung

@mikektung

1 year

the second result "Mountain Dance and Folk Festival" isn't in Boone (it's in Swannanoa). not only is it not in Boone the generated result says its in Asheville, so that's self-contradictory. Source:

0

Mike K Tung

@mikektung

1 year

The first result "An Appalachian Summer Festival" is not July 29 to Aug 16. It was actually June 29 - June 27 last month. Source:

1

0

2

Mike K Tung

@mikektung

1 year

OpenAI just launched SearchGPT! Though much like the Bing chat launch, there are some glaring issues: . Lets take a look at their showcase query "Music Festivals in Boone, NC in August 2024"

1

0

2

Mike K Tung

@mikektung

1 year

It's also a pretty common trope to ask the "how will you deal with misinformation" question when talking about building a KG of facts from the web (the implicit assumption here is LLMs lack common sense) but actually LLMs are already better and dealing different contexts than.

0

3

Mike K Tung

@mikektung

1 year

Works pretty well. "Twitter satire detector' was always one of those classical NLP class projects that never quite worked pre-LLMs

1

0

3

Mike K Tung

@mikektung

1 year

So we've now implemented compilers for English, but we haven't yet implemented a debugger.

0

1

Mike K Tung

@mikektung

2 years

Google Colab has this feature. Settings > Miscellaneous > Power Level.

zack (in SF)

@zack_overflow

2 years

Added gratuitous explosions and particle effects to the code editor I'm building

0

3

Mike K Tung

@mikektung

2 years

This question isn't a problem for KG-augmented systems:

0

1

Mike K Tung

@mikektung

2 years

Structural bias #274 of LLMs: the reversal curse. "A is B" => "B is A", however one can be easily retrieved by LLM and the other cannot.

Owain Evans

@OwainEvans_UK

2 years

Does a language model trained on “A is B” generalize to “B is A”?.E.g. When trained only on “George Washington was the first US president”, can models automatically answer “Who was the first US president?”.Our new paper shows they cannot!

1

0

1