足够大 – Mistral AI

足够大 – Mistral AI
Large Enough – Mistral AI

原始链接: https://mistral.ai/news/mistral-large-2407/

新的 Mistral Large 2 是 Mistral AI 发布的高性能语言模型，与前身相比具有众多改进。拥有128k的更大上下文窗口，支持多种语言，包括英语、法语、德语、西班牙语、意大利语、葡萄牙语、阿拉伯语、印地语、俄语、中文、日语和韩语等80多种编程语言，专为高效、快速而设计，以及强大的AI应用开发。针对长上下文应用程序的单节点推理进行了优化，其 1230 亿个参数的大小确保了出色的吞吐量。出于研究目的，可根据 Mistral 研究许可证使用，商业部署则需要 Mistral 商业许可证。在性能方面，该模型在各种任务中提供了令人印象深刻的结果。在MMLU上，它的准确率达到84%。它还显着优于早期的 Mistral Large，并与 GPT-4o、Claude 3 Opus 和 Llama 3 405B 等顶级行业型号竞争。为了提高模型的准确性，我们努力减少幻觉，提高警惕性，并促进可靠和准确的输出。如果信息不足阻碍了自信的答案，该模型将承认局限性。凭借改进的指令遵循能力和对话能力，该模型擅长遵循详细指令和管理较长的多部分对话。它在 MT-Bench、Wild Bench 和 Arena Hard 等各种基准测试中的表现都表现出了优越性，同时保持了响应长度的简短和集中。此外，该模型在处理多种语言方面表现出了强大的能力，支持流利的英语、法语、德语、西班牙语、意大利语、葡萄牙语、荷兰语、俄语、中文、日语、韩语、阿拉伯语和印地语。 Mistral Large 2 配备了先进的函数调用技能，是复杂业务应用程序背后的强大驱动力。它可以有效地处理串行和并发函数调用。它可通过 Google Cloud Platform、Microsoft Azure AI Studio、Amazon Web Services 和 IBM Watson X 部署在云上，为全球开发人员提供易于访问的解决方案。通过 Mistral AI 平台为该模型提供微调功能，使定制变得比以往更容易。

用户比较了几种 AI 语言模型，即 Mistral Large 2、Llama 3.1 405b 和 Claude，并使用各自历史记录中的 5 个提示对它们进行了测试。他们发现 Large 2 和 Llama 405b 的性能具有可比性，并对它们进行了类似的评价。用户表达了对智能、更长的上下文窗口、本机音频输入、更少的拒绝、更快的处理速度和增加的令牌限制等领域的改进的愿望。他们指出了编辑器中输入的多行代码的格式问题。此外，用户还提到了与编辑器中反引号代码块的行为相关的特性，并将其与他们使用 OpenAI 平台的体验进行了比较。此外，用户还分享了对人类认知过程的观察，注意到阅读习惯的差异以及由于单词内部表示接近而导致的错误，并与人工智能语言模型在识别小而频繁的单词方面的局限性进行了比较。最后，用户提出了“草莓问题”，讨论了人工智能语言模型标记化带来的挑战，强调了它对各种任务（例如审核、解析、生成散文等）的影响，并提出了解决方法，例如对受影响的任务使用显式分隔符。用户还探索了连锁思维的概念及其对人工智能语言模型的影响，观察到它们通常缺乏在单个字符的粒度级别处理输入的能力，导致难以计算给定序列中某些字符的出现次数，例如“草莓”一词中“r”的数量。

原文

This latest generation continues to push the boundaries of cost efficiency, speed, and performance. Mistral Large 2 is exposed on la Plateforme and enriched with new features to facilitate building innovative AI applications.

Mistral Large 2

Mistral Large 2 has a 128k context window and supports dozens of languages including French, German, Spanish, Italian, Portuguese, Arabic, Hindi, Russian, Chinese, Japanese, and Korean, along with 80+ coding languages including Python, Java, C, C++, JavaScript, and Bash.

Mistral Large 2 is designed for single-node inference with long-context applications in mind – its size of 123 billion parameters allows it to run at large throughput on a single node. We are releasing Mistral Large 2 under the Mistral Research License, that allows usage and modification for research and non-commercial usages. For commercial usage of Mistral Large 2 requiring self-deployment, a Mistral Commercial License must be acquired by contacting us.

General performance

Mistral Large 2 sets a new frontier in terms of performance / cost of serving on evaluation metrics. In particular, on MMLU, the pretrained version achieves an accuracy of 84.0%, and sets a new point on the performance/cost Pareto front of open models.

Code & Reasoning

Following our experience with Codestral 22B and Codestral Mamba, we trained Mistral Large 2 on a very large proportion of code. Mistral Large 2 vastly outperforms the previous Mistral Large, and performs on par with leading models such as GPT-4o, Claude 3 Opus, and Llama 3 405B.

A significant effort was also devoted to enhancing the model’s reasoning capabilities. One of the key focus areas during training was to minimize the model’s tendency to “hallucinate” or generate plausible-sounding but factually incorrect or irrelevant information. This was achieved by fine-tuning the model to be more cautious and discerning in its responses, ensuring that it provides reliable and accurate outputs.

Additionally, the new Mistral Large 2 is trained to acknowledge when it cannot find solutions or does not have sufficient information to provide a confident answer. This commitment to accuracy is reflected in the improved model performance on popular mathematical benchmarks, demonstrating its enhanced reasoning and problem-solving skills:

Performance accuracy on code generation benchmarks (all models were benchmarked through the same evaluation pipeline)

Performance accuracy on MultiPL-E (all models were benchmarked through the same evaluation pipeline, except for the "paper" row)

Performance accuracy on GSM8K (8-shot) and MATH (0-shot, no CoT) generation benchmarks (all models were benchmarked through the same evaluation pipeline)

Instruction following & Alignment

We drastically improved the instruction-following and conversational capabilities of Mistral Large 2. The new Mistral Large 2 is particularly better at following precise instructions and handling long multi-turn conversations. Below we report the performance on MT-Bench, Wild Bench, and Arena Hard benchmarks:

Performance on general alignment benchmarks (all models were benchmarked through the same evalutation pipeline)

On some benchmarks, generating lengthy responses tends to improve the scores. However, in many business applications, conciseness is paramount – short model generations facilitate quicker interactions and are more cost-effective for inference. This is why we spent a lot of effort to ensure that generations remain succinct and to the point whenever possible. The graph below reports the average length of generations of different models on questions from the MT Bench benchmark:

Language diversity

A large fraction of business use cases today involve working with multilingual documents. While the majority of models are English-centric, the new Mistral Large 2 was trained on a large proportion of multilingual data. In particular, it excels in English, French, German, Spanish, Italian, Portuguese, Dutch, Russian, Chinese, Japanese, Korean, Arabic, and Hindi. Below are the performance results of Mistral Large 2 on the multilingual MMLU benchmark, compared to the previous Mistral Large, Llama 3.1 models, and to Cohere’s Command R+.

Performance on Multilingual MMLU (measured on the base pretrained model)

Tool Use & Function Calling

Mistral Large 2 is equipped with enhanced function calling and retrieval skills and has undergone training to proficiently execute both parallel and sequential function calls, enabling it to serve as the power engine of complex business applications.

Try Mistral Large 2 on la Plateforme

You can use Mistral Large 2 today via la Plateforme under the name mistral-large-2407, and test it on le Chat. It is available under the version 24.07 (a YY.MM versioning system that we are applying to all our models), and the API name mistral-large-2407. Weights for the instruct model are available and are also hosted on HuggingFace.

we are consolidating the offering on la Plateforme around two general purpose models, Mistral Nemo and Mistral Large, and two specialist models, Codestral and Embed. As we progressively deprecate older models on la Plateforme, all Apache models (Mistral 7B, Mixtral 8x7B and 8x22B, Codestral Mamba, Mathstral) remain available for deployment and fine-tuning using our SDK mistral-inference and mistral-finetune.

Starting today, we are extending fine-tuning capabilities on la Plateforme: those are now available for Mistral Large, Mistral Nemo and Codestral.

Access Mistral models through cloud service providers

We are proud to partner with leading cloud service providers to bring the new Mistral Large 2 to a global audience. In particular, today we are expanding our partnership with Google Cloud Platform to bring Mistral AI’s models on Vertex AI via a Managed API. Mistral AI’s best models are now available on Vertex AI, in addition to Azure AI Studio, Amazon Bedrock and IBM watsonx.ai.