mikektung Profile Banner
Mike K Tung Profile
Mike K Tung

@mikektung

Followers
895
Following
335
Media
20
Statuses
272

CEO at Diffbot, world's largest knowledge graph. Mostly here to read papers.

Stanford, CA
Joined July 2010
Don't wanna be here? Send us removal request.
@mikektung
Mike K Tung
3 months
One could say the weights themselves are maximally truth-seeking.
0
0
1
@mikektung
Mike K Tung
3 months
You can try it out here!
0
0
0
@mikektung
Mike K Tung
3 months
70B model: 8B model:
1
0
0
@mikektung
Mike K Tung
3 months
We've released both 8B and 70B variants that you can run on your own H100s, or just use our cloud version. Our v1 had over 90k+ downloads as well as 3rd party quants that let you run it on consumer GPUs, so it'll be interesting to see what people build with this one!.
1
0
0
@mikektung
Mike K Tung
3 months
You can also flip the reasoning on or off by just stating "Think tokens: on" in the system message, allowing you to combine it into other workflows.
1
0
0
@mikektung
Mike K Tung
3 months
The tricky part in training this LLM was baking this "function calling intelligence" into the weights itself so that it does not need explicit instructions at inference time for how to select and craft queries, when to retry functions, how to adjust search strategies, etc.
1
0
0
@mikektung
Mike K Tung
3 months
It's just the raw weights of the LLM elegantly maximizing in its next token prediction whether to think, what functions to call to get more information, or write, until it is done.
2
0
0
@mikektung
Mike K Tung
3 months
Unlike other agentic "deep research" implementations, there is no "orchestration" code (fancy word for prompts + if-else statements).
1
0
1
@mikektung
Mike K Tung
3 months
What this results in is a much more compact thinking trace and is much more adaptive than "agentic" systems that build a fixed plan to follow through on, which often take several minutes to hours to run. Its also backed by the world's largest knowledge graph and our own webindex.
1
0
1
@mikektung
Mike K Tung
3 months
Interleaved function calling is when it doesn't just do a fixed <think>, then respond (like the first reasoning models O1/DeepSeek R1 did), but can alternate between thinking, calling, and writing as it generates the output.
1
0
1
@mikektung
Mike K Tung
3 months
It's out! Just git push'ed to Hugging Face the open-source weights of v2 of the Diffbot GraphRAG LLM. This is the first open-source LLM that has been trained to do interleaved function call reasoning, like OpenAI's o3, except it's runnable on your own hardware.
1
0
1
@mikektung
Mike K Tung
1 year
the second result "Mountain Dance and Folk Festival" isn't in Boone (it's in Swannanoa). not only is it not in Boone the generated result says its in Asheville, so that's self-contradictory. Source:
0
0
0
@mikektung
Mike K Tung
1 year
The first result "An Appalachian Summer Festival" is not July 29 to Aug 16. It was actually June 29 - June 27 last month. Source:
1
0
2
@mikektung
Mike K Tung
1 year
OpenAI just launched SearchGPT! Though much like the Bing chat launch, there are some glaring issues: . Lets take a look at their showcase query "Music Festivals in Boone, NC in August 2024"
Tweet media one
1
0
2
@mikektung
Mike K Tung
1 year
It's also a pretty common trope to ask the "how will you deal with misinformation" question when talking about building a KG of facts from the web (the implicit assumption here is LLMs lack common sense) but actually LLMs are already better and dealing different contexts than.
0
0
3
@mikektung
Mike K Tung
1 year
Works pretty well. "Twitter satire detector' was always one of those classical NLP class projects that never quite worked pre-LLMs
Tweet media one
1
0
3
@mikektung
Mike K Tung
1 year
So we've now implemented compilers for English, but we haven't yet implemented a debugger.
0
0
1
@mikektung
Mike K Tung
2 years
Google Colab has this feature. Settings > Miscellaneous > Power Level.
@zack_overflow
zack (in SF)
2 years
Added gratuitous explosions and particle effects to the code editor I'm building
0
0
3
@mikektung
Mike K Tung
2 years
This question isn't a problem for KG-augmented systems:
Tweet media one
0
0
1
@mikektung
Mike K Tung
2 years
Structural bias #274 of LLMs: the reversal curse. "A is B" => "B is A", however one can be easily retrieved by LLM and the other cannot.
@OwainEvans_UK
Owain Evans
2 years
Does a language model trained on “A is B” generalize to “B is A”?.E.g. When trained only on “George Washington was the first US president”, can models automatically answer “Who was the first US president?”.Our new paper shows they cannot!
Tweet media one
1
0
1