(评论)
(comments)

原始链接: https://news.ycombinator.com/item?id=43882905

Hacker News 上的一篇讨论围绕着使用 Google Gemini LLM API(尤其是在 Vertex AI 中)的诸多困难展开。原文已不可访问,但评论指出,多模态输入是一个挑战,需要使用 Node.js 的 `fs` API 和文件管理器进行变通,这对无服务器环境来说是个问题。 Simonw 建议绕过 Vertex AI,直接使用 HTTP API,因为它拥有更简单的 API 密钥和文档。一些用户更喜欢兼容 OpenAI 的端点,因为它集成更简单,但同时也指出它缺乏完整的特性,例如无法禁用“flash 2 thinking”。 Vertex AI 被定位为企业级解决方案,具有区域控制、服务认证和潜在的更低延迟等优势。其他评论批评了缺乏 OpenAPI 规范、依赖过时的 protobuf 定义以及 Google API 的整体上手难度。一位评论者对 Google 逐渐放弃 API-first 的策略表示遗憾,而另一位则认为 Gemini 2.5 通过具有竞争力的价格、长上下文模型和独特的特性,已经收回了失去的地位。总体而言,人们普遍认为 Google 的 LLM API,尤其是在 Vertex AI 中,使用起来非常令人沮丧。

相关文章
  • (评论) 2025-04-06
  • (评论) 2025-03-25
  • (评论) 2025-04-09
  • (评论) 2025-03-26
  • (评论) 2025-04-12

  • 原文
    Hacker News new | past | comments | ask | show | jobs | submit login
    Google Gemini has the worst LLM API (venki.dev)
    16 points by indigodaddy 2 hours ago | hide | past | favorite | 14 comments










    Site seems to be down - I can’t get the article to load - but by far the most maddening part of Vertex AI is the way it deals with multimodal inputs. You can’t just attach an image to your request. You have to use their file manager to upload the file, then make sure it gets deleted once you’re done.

    That would all still be OK-ish except that their JS library only accepts a local path, which it then attempts to read using the Node `fs` API. Serverless? Better figure out how to shim `fs`!

    It would be trivial to accept standard JS buffers. But it’s not clear that anyone at Google cares enough about this crappy API to fix it.



    I still don't really understand what Vertex AI is.

    If you can ignore Vertex most of the complaints here are solved - the non-Vertex APIs have easy to use API keys, a great debugging tool (https://aistudio.google.com), a well documented HTTP API and good client libraries too.

    I actually use their HTTP API directly (with the ijson streaming JSON parser for Python) and the code is reasonably straight-forward: https://github.com/simonw/llm-gemini/blob/61a97766ff0873936a...

    You have to be very careful when searching (using Google, haha) that you don't accidentally end up in the Vertext documentation though.

    Worth noting that Gemini does now have an OpenAI-compatible API endpoint which makes it very easy to switch apps that use an OpenAI client library over to backing against Gemini instead: https://ai.google.dev/gemini-api/docs/openai

    Anthropic have the same feature now as well: https://docs.anthropic.com/en/api/openai-sdk



    OpenAI compatible API is missing important parameters, for example I don't think there is a way to disable flash 2 thinking with it.

    Vertex AI is for grpc, service auth, and region control (amongst other things). Ensuring data remains in a specific region, allowing you to auth with the instance service account, and slightly better latency and ttft



    From the linked docs:

    > If you want to disable thinking, you can set the reasoning effort to "none".

    For other APIs, you can set the thinking tokens to 0 and that also works.



    Wow thanks I did not know


    Vertex AI is essentially a rebranding of their more enterprise platform on GCE, nothing explicitly "new."


    I don't get the outrage. Just use their OpenAI endpoints: https://ai.google.dev/gemini-api/docs/openai

    It's the best model out there.



    Additionally, there's no OpenAPI spec, so you have to generate one from their protobuf specs if you want to use that to generate a client model. Their protobuf specs live in a repo at https://github.com/googleapis/googleapis/tree/master/google/.... Now you might think that v1 would be the latest there, but you would be wrong - everyone uses v1beta (not v1, not v1alpha, not v1beta3) for reasons that are completely unclear. Additionally, this repo is frequently not up to date with the actual API (it took them ages to get the new thinking config added, for example, and their usage fields were out of date for the longest time). It's really frustrating.


    I have not pushed my local commits to GitHub lately (and probably should), but my experience with the Gemini API so far has been relatively positive:

    https://github.com/ryao/gemini-chat

    The main thing I do not like is that token counting is rated limited. My local offline copies have stripped out the token counting since I found that the service becomes unusable if you get anywhere near the token limits, so there is no point in trimming the history to make it fit. Another thing I found is that I prefer to use the REST API directly rather than their Python wrapper.

    Also, that comment about 500 errors is obsolete. I will fix it when I do new pushes.



    Google’s APIs are all kind of challenging to ramp up on. I’m not sure if it’s the API itself or the docs just feeling really fragmented. It’s hard to find what you’re looking for even if you use their own search engine.


    Even their OAI-compatible API isn't fully compatible. Tools like Instructor have special-casing for Gemini...


    In general, it's just wild to see Google squander such an intense lead.

    In 2012, Google was far ahead of the world in making the vast majority of their offerings intensely API-first, intensely API accessible.

    It all changed in such a tectonic shift. The Google Plus/Google+ era was this weird new reality where everything Google did had to feed into this social network. But there was nearly no API available to anyone else (short of some very simple posting APIs), where Google flipped a bit, where the whole company stopped caring about the rest of the world and APIs and grew intensely focused on internal use, on themselves, looked only within.

    I don't know enough about the LLM situation to comment, but Google squandering such a huge lead, so clearly stopping caring about the world & intertwingularity, becoming so intensely internally focused was such a clear clear clear fall. There's the Google Graveyard of products, but the loss in my mind is more clearly that Google gave up on APIs long ago, and has never performed any clear acts of repentance for such a grevious mis-step against the open world, open possibilities, against closed & internal focus.



    With Gemini 2.5 (both Pro and Flash) Google have regained so much of that lost ground. Those are by far the best long-context models right now, extremely competitively priced and they have features like image mask segmentation that aren't available from other models yet: https://simonwillison.net/2025/Apr/18/gemini-image-segmentat...


    I think the commenter was saying google squandered its lead ("goodwill" is how I would refer to it) in providing open and interoperable services, not the more recent lead it squandered in AI. I agree with your point that they've made up a lot of that ground with gemini 2.5.






    Join us for AI Startup School this June 16-17 in San Francisco!


    Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact



    Search:
    联系我们 contact @ memedata.com