AIGCLINK
@aigclink
Followers
31K
Following
663
Media
3K
Statuses
5K
致力于让每个想拥抱AI的人都能找到适合自己的AI产品,助力企业定制AIGC应用
Joined June 2022
Gemini Live大更新,谷歌刚刚给Gemini Live加了“人声特效包”,像真人一样表达带节奏、带口音,交互更自然 新版模型能实时识别并控制语速、韵律与口音 可以用它陪练外语、模拟面试、讲带感故事 #GeminiLive #AI语音
We just shipped our biggest update to Gemini Live ever. It's smarter. It's more expressive. It has accents. And my favorite: it can talk faster! Here are 3 ways people are already using it in @GeminiApp. 🧵
5
19
86
微软给的一套AI呼叫中心解决方案,Azure+OpenAI,扔一个API请求或直接拨号,AI语音客服即可接/打电话、记录报修、面试预约等 可以呼入、呼出 实时语音对话,支持打断、静音检测、多语言TTS/ASR、定制AI语音 通话结束后即生成一个网页报告 写工单可自定义字段,比如时间、地点、其他信息等
3
12
79
谷歌DeepMind昨晚发布了:SIMA 2,能在3D虚拟世界里陪玩、一起推理、自我升级的通用AI智能体 基于Gemini驱动,它会思考能理解,能在互动环境中采取行动,可以用文字、语音、图片对话互动 是一个会思考目标、解释步骤的游戏搭子,可以多语言+表情包输入,甚至画个草图它也能懂 与Genie
SIMA 2 is our most capable AI agent for virtual 3D worlds. 👾🌐 Powered by Gemini, it goes beyond following basic instructions to think, understand, and take actions in interactive environments – meaning you can talk to it through text, voice, or even images. Here’s how 🧵
1
9
39
NotebookLM也搞了个Deep Research功能,研究内容可直接添加到笔记本中 给定主题,它会对数百个网站进行深度研究,出报告,并附带带有批注的来源清单 同时还上新了【自定义视频概览风格】功能,可以自由定制各种风格的视频概览 在“自定义框”里输入提示词,系统即会按照描述生成对应的视频讲解风格
The moment you've ACTUALLY been waiting for... Introducing Deep Research! Rolling out now, Deep Research browses hundreds of sites to craft an organized report AND gives you an annotated list of sources for deeper exploration, all of which you can add directly to your notebook.
3
12
50
Google DeepMind在Nature上发布了其最新研究,【教AI像人类一样看世界】 让AI不是“只会认图”的机器,而是开始“理解图”的智能体 当前视觉AI应用虽广泛,但其“视觉理解”方式与人类有系统性差异,AI无法像人类一样理解“汽车和飞机都是大型金属交通工具”这种层次概念
Humans think about things conceptually – like how cats and starfish are both animals, despite their differences. But AI can sometimes miss this nuance. 🖼️ Our research teaches vision models to better organize visual concepts, making them more reliable and better at generalizing.
1
4
19
李飞飞World Labs的生成式多模态世界模型:Marble 刚刚已发布,一张图片/视频/文本提示/3D布局,即生成高保真3D世界 单图、文本、多视角图、短视频、粗糙3D块都能当prompt 生成之后还能二次创作 AI原生笔刷可以局部删改、换材质、换风格、换结构 可以一键���世界 也可以多个小场景拼接
Introducing Marble by World Labs: a foundation for a spatially intelligent future. Create your world at https://t.co/V267VJu1H9
1
7
30
OpenAI刚刚放出了GPT-5.1,主打“更智能更具聊天性” GPT-5.1 Instant版,引入了自适应推理能力,可根据问题难度决定是否先思考再回答,同时保持快速响应,指令遵循更好 GPT-5.1 Thinking版,快慢自适应,简单问题秒回,复杂问题多花时间,难题答得更深入,简单题等待时间更短了 #GPT51 #OpenAI
GPT-5.1 in ChatGPT is rolling out to all users this week. It’s smarter, more reliable, and a lot more conversational. https://t.co/SA1Q1GPyxV
1
0
7
字节最新发了一款编程模型:Doubao-Seed-Code,擅长Agentic编程任务,具备视觉理解能力,256K上下文 TRAE中国版已接入Doubao-Seed-Code,在与TRAE深度结合后,在SWE-Bench Verified上达到78.8%成绩 256K上下文,使其可以处理长代码文件、多模块依赖等等复杂的场景,前端能力突出
4
7
39
ElevenLabs最新实时语音转文本模型:Scribe v2 Realtime,150毫秒转录90+种语言 可以用于语音助手、会议记录或者实时应用程序等等 英语、日语等的WER≤5% ,中文普通话>5% ≤10% #ASR #STT
Introducing Scribe v2 Realtime – the most accurate real-time Speech to Text model. Built for voice agents, meeting notetakers, and live applications, it transcribes in 150ms across 90+ languages, including English, French, German, Italian, Spanish, Portuguese, Hindi, and
1
21
81
百度刚刚放出了:ERNIE-4.5-VL-28B-A3B-Thinking,主打“看图思考”,会用图像缩放、搜索等工具自由放大和缩小图像 这就提升了模型处理细粒度细节和长尾视觉知识能力,可以更全面理解复杂的视觉场景 在VLMs Are Blind上超GPT-5-High、GEMINI-2.5-Pro
1
8
18