New DeepSeek Janus Pro 7B Beats OpenAI Dall-E 3 on Image Generation

DeepSeek honest correct dropped a fresh open-supply multmodal AI mannequin, Janus-Pro-7B. It is MIT opensource license.

It’s multimodal (can generate photos) and beats OpenAI’s DALL-E 3 and Accumulate Diffusion across GenEval and DPG-Bench benchmarks.

This comes on top of the total R1 hype.

Right here is the hyperlink to the Deepseek Janus 7B Github.

Screenshot 2025 01 27 at 11.00.36 AMScreenshot 2025 01 27 at 11.04.53 AM

NEWS: DeepSeek honest correct dropped ANOTHER open-supply AI mannequin, Janus-Pro-7B.

It is multimodal (can generate photos) and beats OpenAI’s DALL-E 3 and Accumulate Diffusion across GenEval and DPG-Bench benchmarks.

This comes on top of the total R1 hype. The 🐋 is cookin’ pic.twitter.com/yCmDQoke0f

— Rowan Cheung (@rowancheung) January 27, 2025

Right here is the Huggingface dwelling for DeepSeek Janus Pro 7B.

Janus-Pro is a new autoregressive framework that unifies multimodal working out and technology. It addresses the barriers of old approaches by decoupling visual encoding into separate pathways, while nonetheless utilizing a single, unified transformer architecture for processing. The decoupling no longer easiest alleviates the warfare between the visual encoder’s roles in working out and technology, nonetheless also enhances the framework’s flexibility. Janus-Pro surpasses old unified mannequin and matches or exceeds the performance of job-convey fashions. The simplicity, excessive flexibility, and effectiveness of Janus-Pro originate it a ambitious candidate for next-technology unified multimodal fashions.

Mannequin Summary

Janus-Pro is a unified working out and technology MLLM, which decouples visual encoding for multimodal working out and technology. Janus-Pro is constructed in step with the DeepSeek-LLM-1.5b-unhealthy/DeepSeek-LLM-7b-unhealthy.

For multimodal working out, it makes enlighten of the SigLIP-L as the vision encoder, which helps 384 x 384 image input. For image technology, Janus-Pro makes enlighten of the tokenizer from here with a downsample fee of 16.

Read Extra

Scroll to Top