A LocalLLaMA thread highlighted a pair of relatively small open models that tackle grounding, segmentation, and OCR with architecture choices aimed at practical deployment rather than sheer scale.
#vision-language
RSS FeedTogether AI said on March 19, 2026 that its fine-tuning service now supports tool-call, reasoning, and vision-language workflows. The linked Together AI blog adds 100B+ parameter model support, datasets up to 100GB, up to 6x higher throughput on large MoE models, and upfront cost plus ETA estimates.
Together AI said on March 19, 2026 that its fine-tuning service now supports tool calling, reasoning, and vision-language model training, with up to 6x higher throughput on MoE architectures. The company says the update also targets very large models, supports datasets up to 100GB, and adds pre-run cost estimates plus live ETAs during training.
A high-engagement LocalLLaMA post on March 4, 2026 discussed Microsoft’s open-weight Phi-4-Reasoning-Vision-15B and focused on practical deployment tradeoffs for local multimodal inference.