#reasoning

LLM Hacker News 1d ago 1 min read

GPT-5.5 API公開でHNが先に見たもの、性能より価格と挙動

HNはGPT-5.5を祝賀ムードより先に検算モードで迎えた。最初に問われたのは、どれだけ賢いかより、価格とコンテキスト帯、そしてコーディング時の振る舞いが本当に改善したのかだった。

LLM sources.twitter 3d ago 1 min read

GPT-5.5、Artificial Analysisで3点差首位に復帰　実行コストは20％上振れへ

重要なのは、GPT-5.5 launch直後に出た最初期のexternal benchmark readoutのひとつだという点だ。Artificial AnalysisはIntelligence Indexで3点差首位とした一方、指数実行コストは約20％高くなったと述べた。

#gpt-5-5 #artificial-analysis #benchmarks

LLM sources.twitter Apr 12, 2026 1 min read

Meta、Meta Superintelligence Labs 初のモデル Muse Spark を公開

AI at Metaは2026年4月8日のXで、Muse Sparkを tool use、visual chain of thought、multi-agent orchestration を備えた natively multimodal reasoning model として紹介した。Meta の公式発表では、このモデルはすでに Meta AI app と meta.ai を支えており、今後 WhatsApp、Instagram、Facebook、Messenger、AI glasses へ展開され、selected partners 向け private-preview API も提供されるとしている。

#meta #muse-spark #multimodal

LLM Hacker News Apr 9, 2026 1 min read

Meta、マルチモーダル推論と並列エージェントを備えた Muse Spark を公開

Hacker Newsで、Meta Superintelligence Labsによる Muse Spark の発表が大きく注目された。tool use、visual chain of thought、並列エージェント型の Contemplating mode を備えたマルチモーダル推論モデルだ。

#meta #muse-spark #multimodal

AI Reddit Mar 30, 2026 1 min read

r/singularityが注目したARC-AGI 3、行動効率まで測る新しい一般化評価

ARC PrizeによるARC-AGI 3公開直後、r/singularityはinteractive environmentとaction-efficient scoringへの転換に注目した。要点は、frontier AIが未知環境での一般化・探索・計画ではまだ大きく遅れているということだ。

#arc-agi #benchmarks #reasoning

LLM Mar 29, 2026 1 min read

Mistral、reasoning・coding・multimodalを統合したオープンソースモデルMistral Small 4を発表

Mistralは2026年3月16日、reasoning、multimodal入力、agentic codingを1つにまとめたMistral Small 4を公開した。119B total parameters、6B active parameters、256k context window、Apache 2.0、configurable reasoning_effortが主要ポイントだ。

#llm #multimodal #reasoning

AI Hacker News Mar 26, 2026 1 min read

ARC-AGI-3がinteractive reasoning benchmarkの焦点を塗り替える

ARC Prizeは2026年3月24日にARC-AGI-3を公開し、novel environmentでのagentic intelligenceを測るbenchmarkとして位置付けた。Hacker Newsでは238 points、163 commentsを集め、static task中心の評価からの転換として受け止められている。

#arc-agi #agents #benchmark

LLM Mar 24, 2026 1 min read

Microsoft Research、Phi-4-reasoning-vision-15B公開　multimodal reasoning効率を前面に

Microsoft Researchは2026年3月4日、15 billion parameterのopen-weight modelであるPhi-4-reasoning-vision-15Bを発表した。同社は、より大規模なsystemほどのcompute負荷を伴わずに、multimodal reasoning、math・science task、computer-use性能を高めることを狙ったと説明している。

#microsoft #phi-4 #multimodal

LLM sources.twitter Mar 23, 2026 1 min read

Together AI、tool calling・reasoning・VLM fine-tuningを拡張　100B+ modelと最大6倍 throughputを支援

Together AIは2026年3月19日、自社のfine-tuningサービスがtool call、reasoning、vision-language workflowをネイティブに支援すると発表した。リンク先のTogether AIブログは、100B+ parameter model、最大100GB dataset、大規模MoE modelで最大6倍のthroughput、学習前のcost estimateと実行中のETAまで含まれると説明している。

#together-ai #fine-tuning #tool-calling

LLM Mar 23, 2026 1 min read

Google、Gemini 3.1 Proを公開　complex reasoningとagentic workflow向けの基盤を強化

Googleは2026年2月19日、Gemini 3.1 Proを公開し、Google AI Studio、Gemini CLI、Vertex AI、Gemini app、NotebookLMなどへの展開を開始した。ARC-AGI-2で77.1%を記録し、Gemini 3 Proのreasoning性能を2倍超に高めたとGoogleは説明している。

#google #gemini #reasoning

LLM Hacker News Mar 23, 2026 1 min read

Hacker Newsで議論、layer を複製して reasoning を高める no-training LLM experiment

Show HN で紹介された llm-circuit-finder は、GGUF 内の特定 layer block をもう一度通すことで reasoning を改善できると主張する。もっとも強い benchmark 数値は repo author の自己報告であり、独立検証ではない。

#llm #reasoning #benchmark

LLM sources.twitter Mar 22, 2026 1 min read

Together AI、tool calling・reasoning・VLM fine-tuning拡張　大規模MoE学習を高速化

Together AIは2026年3月19日、fine-tuningサービスがtool calling、reasoning、vision-language model学習に対応し、MoEアーキテクチャで最大6倍高いthroughputを実現すると発表した。公式記事では大規模モデル対応、100GBデータセット、事前コスト見積もり、学習中ETAも説明している。