（评论）

（评论）
(comments)

原始链接: https://news.ycombinator.com/item?id=40680737

Python 曾经被视为较成熟的编程语言的一种不太严肃的替代品。它的简单性和易用性使其有别于传统方法，类似于艺术家使用简单的工具（例如烟灰缸中的灰烬）来创作复杂的图画，尽管存在局限性。然而，随着 Python 的创建者继续前进，让新手用户失去指导，这种语言开始恶化。该语言的新添加使其变得更加复杂，但功能却没有改进。以进步的名义烧毁图书馆，使得工具开发成本更高。此外，随着大量初学者将其作为编码的切入点，Python 社区发生了巨大变化。它曾经是求知欲强的人寻求替代方案的选择，现在却成为不情愿的学习者的默认选择。其他语言（如 Java）的出现提供了结构和稳定性。拥有受人尊敬职位的公司聘请了专家，而这些职位是不可接触的。与许多职业不同，编程的进步并不是由寿命决定的。优秀的程序员面临着挫折、转换领域或满足于不令人满意的办公桌工作等挑战。尽管存在这些障碍，一些人还是成功创办了自己的企业或加入了行业内的精英行列。旨在迎合不同 GPU 供应商的跨平台项目往往无法按预期交付，主要关注 NVIDIA 支持。即使在渲染引擎中，Blender 和 Otoy 的 Octane Render 也面临着 NVIDIA 环境之外的兼容性问题。对于想要通用、适应性强的解决方案的开发人员来说，与专有框架的限制作斗争是令人沮丧的。理想情况下，允许跨多个平台使用的覆盖系统将简化实施并促进更广泛的接受。然而，现实仍然陷入碎片化，限制了真正的创新和增长。

Does it really matters in performance. I see python in these kind of setups as orchestrators of computing apis/engines. For example from python you instruct to compute following list etc. No hard computing in python. Performance not so much of an issue.

Marshaling is an issue as well as concurrency.

Simply copying a chunk of data between two libraries through Python is already painful. There are so-called "buffer API" in Python, but it's very rare that Python users can actually take advantage of this feature. If anything in Python as much as looks at the data, that's not going to work etc.

Similarly, concurrency. A lot of native libraries for Python are written with the expectation that nothing in Python really runs concurrently. And then you are presented with two bad options: try running in different threads (so that you don't have to copy data), but things will probably break because of races, or run in different processes, and spend most of the time copying data between them. Your interface to stuff like MPI is, again, only at the native level, or you will copy so much that the benefits of distributed computation might not outweigh the downsides of copying.

I think we will get there in the end but it will be slow.

When I was doing performances stuff: Intel was our main platform and memory consistency was stronger there. We would try to write platform-agnostic multi threading code (in our case, typically spin-locks) but without testing properly on the other platforms, we would make mistakes and end up with race conditions, accessing unsynchronized data, etc.

I think Python will be the same deal. With Python being GIL'd through most of its life cycle bits and pieces won't work properly, multi-threaded until we fix them.

there’s already a PEP to address the GIL. python is going through its JVM optimization phase. it’s too popular and ubiquitous that improving the GIL and things like it are inevitable

I believe it matters for startup time and memory usage. Once you've fully initialized the library and set it off, the entire operation happens without the Python interpreter's involvement, but that initial setup can still be important sometimes.

CuPy is also great – makes it trivial to port existing numerical code from NumPy/SciPy to CUDA, or to write code than can run either on CPU or on GPU.

I recently saw a 2-3 orders of magnitude speed-up of some physics code when I got a mid-range nVidia card and replaced a few NumPy and SciPy calls with CuPy.

I’m a huge Taichi stan. So much easier and more elegant than numba. The support for data classes and data_oriented classes is excellent. Being able to define your own memory layouts is extremely cool. Great documentation. Really really recommend!

I would like to warn people away from taichi if possible. At least back in 1.7.0 there were some bugs in the code that made it very difficult to work with.

I only dabbled in Taichi, but I find its magic has limitation. I took a provided example, just increased the length of the loop and bam! it crashed the Windows driver. Obviously it ran out of memory but I have no idea how how to adjust except experiment with different values. If it has information about the GPU and its memory, I thought it could automatically adjust the block size but apparently not. There is a config command to fine tune the for loop parallelizing but the docs says we normally do not need to use them.

Python is designed from the ground up to be hostile to efficient compiling. It has really poorly designed internals, and proudly exposes them to application code, in a documented way and everything.

Only a restricted subset of Python is efficiently compilable.

mypyc can compile a strictly-typed subset of Python AOT to native code, and as a bonus it can still interop with native Python libraries whose code wasn't compiled. It's slightly difficult to set up but I've used it in the past and it is a decent speedup. (plus mypy's strict type checking is sooo good)

Unfortunately there are folks that believe we have to use Python, and then rewrite in C, while calling those wrappers "Python" libraries, regardless of how much I would like to use JVM or CLR instead.

So if Python has taken Lisp's place in AI, I would rather that it also takes the JIT and AOT tooling as well.

PyTorch and JAX have very good — but narrowly scoped — compilation facilities, and this suffices for a lot of really compelling AI applications.

A strong general purpose compilation story for Python is unlikely to materialize, IMO.

The pressure from slow and steady Julia's adoption, Mojo arriving into the scene, Facebook and Microsoft jumping into the JIT story, changing CPython team opinion on the matter, still leaves some hopes it might happen.

However now having Pytorch, Tensorflow and ONMX exposing the underylying native libraries to more programming languages, might mean to leave Python as it is, and use the other programming stacks for production.

Well, then, I guess, I should've prefaced what I wrote with "ideally".

Of course, there are circumstances we, the flesh-and-blood people cannot change, and if Python is all there is, no matter how good or bad it is, then it's probably better to make it do something we need... One could still hope though.

This is a horribly outdated take and ignores many of the benefits Python has brought to so many fields. At best it's elitist, at worst egotistical and based upon outdated interpretations.

Also, why so divisive? There's really no need for the deviciveness.

gosh, pythonistas must have called you really bad names.

JVM wanted to be Self, but Sun was on its last legs by then. now it's just a weird historic artifact, still casting a shadow, or at least causing an unpleasant smell.

I'm curious about the masters-have-left claim: do you really thing current Python action is somehow foolish?

I don't know about "foolish", but the current efforts adding complex threading and a JIT that promises 30% speedup over 3.10 look misguided.

Before, the selling point was simplicity and C extensions.

Just go to GitHub: Apart from the people who are paid to integrate the above features development has all but stalled. The corporate takeover of the "free" software Python (without a standard or viable other implementations like C++) is complete!

Python was never a good language. It was always a joke. But jokes have a short shelf life. The reason to use it was to be in the opposition to the pompous way of how the mainstream was trying to do things.

It's like with academic drawing. A student would spend countless hours or even days trying to draw a model, and would still come out with broken anatomy, made of plaster and not even a bit similar to the model itself. And they would use a huge arsenal of tools, and probably some expensive ones. While a master could scoop some ash from an ashtray, and in a few lines drawn by a finger accomplish a lot more than that student's day's work, and have the model look alive, with plausible anatomy etc.

Python was the ash from an ashtray. It didn't need to be good as a tool to accomplish its goal. Actually, the opposite. But now, with the masters gone, the students don't know why the masters were so good at it, and they try to copy whatever they did, including finger drawing with ash. It works even worse than if they used their huge arsenal of tools, but they no longer have anyone to guide them.

You might think this is a ridiculous story, but this actually happened before my eyes in academic drawing in a particular art academy. But it's a story for a different time.

So, to answer your question, with some examples:

Yes, Python today is thousand times worse than what it was 15 years ago. In general, adding to a programming language makes it worse. Python not only adds, it adds useless garbage. Stuff like f-strings, match-case, type annotations, type classes, async... all this stuff made the language a lot more complex and added nothing of value.

Lots of Python libraries were burnt in a name of "progress" (broken backwards-compatibility). Making tools for Python became prohibitively expensive due to the effort it takes to work with the language.

But, it didn't stop there. The community itself changed its shape dramatically. Python, from being usually the second language, learnt by someone intelligent enough to see the problems with the mainstream and looking for alternatives, it became the first language programmers learn on their way into trade. Python more and more started to bend to meet the demands of its new audience: the unwilling to learn amateurs. And that's where Java was good! Java was made for exactly this purpose. Java came packed with guard-rails and walking sticks.

Into management? :)

There is no career path for programmers. Once you are a programmer, that's the end of your career. You climb the ladder by starting to manage people. But you don't become a super-programmer.

This is very much false. Many companies explicitly have a career path for individual contributors, even. Managers can jump higher, sure, but there's still a world of difference between a junior and a principal; and then there's technical fellows.

I work in a company that has principals and fellows. Here's what you are missing:

* Overwhelming majority of programming companies don't have anything like that.

* Overwhelming majority of programmers, even if they work in a company who has that will never become principals or fellows simply because there's no official process of becoming one. It's usually a gift from your management. You don't take a test, nobody's going to count your commits or measure your programming output in any other way and say "whoa, this guy now qualifies for this title!"

* You are confused between being good at what you do and a title HR puts on your paycheck. To make you understand this better: say, if you are a surgeon, then bettering yourself as a surgeon is very likely to make you more valuable in the eyes of your employer, and, subsequently, put more money into your bank account. If you better yourself as a programmer, in most likelihood you will end up disappointed you wasted all this time for nothing. You will have problems getting a job, since often you'll be overqualified for what's on the offer. On top of this, a title you earned with one employer doesn't translate into an equivalent title at a different employer. Employers use these "principal" or "fellow" titles to justify paying you a bit more, so that they don't have to deal with hiring a replacement for you. If you switch jobs, the new employer has no incentive to hire you at the same level of "pretend seniority".

In other words you are confused between added pay that you accumulate as long as you keep working for the same employer, no matter the quality of your output, with the pay you get for being good at what you do. Usually, in many professions, people become better the longer they spend in their field. In programming this is less so.

Haha, good one. Well, I'm not saying there's no difference between programmers in terms of how well they can do their job. All I'm saying is the difference isn't reflected in the paycheck.

If you are a good programmer (ok, a 10x one!), then, eventually, one of these will happen to you:

* Despair. You come to despise the peers, the tools you have to work with, the market that will never acknowledge nor learn how to utilize your skills... and you become a vertical farmer, or an Irish pub owner, or a part-time elementary school PA teacher. (All of those I've witnessed firsthand.)

* Like I mentioned before, you become a manager. Or maybe go back to academia and become a professor. You still program, sort of, but mostly on a whiteboard.

* The technology you dedicated your life efforts to dies. And then you find yourself in an unfamiliar although related field where your past successes and fame aren't acknowledged. You feel too tired to prove yourself once again and succumb to a boring and meaningless desk job. Perhaps you become a better home cook, or a better parent.

* Start your own company. Very few do this. Most of those who do fail, and then loop back to the of the first three. Otherwise, you are promoted into the owners class. I don't know what happens there, it's happening behind closed doors, at least I was never invited.

I really like how nvidia started doing more normal open source and not locking stuff behind a login to their website. It makes it so much easier now that you can just pip install all the cuda stuff for torch and other libraries without authenticating and downloading from websites and other nonsense. I guess they realized that it was dramatically reducing the engagement with their work. If it’s open source anyway then you should make it as accessible as possible.

“To that extent” in this context means that the remainder of the contract stays valid. The interpretation you state is not only incorrect, it would be toothless to introduce it because people could simply work around it by adding that clause and arguing that it constitutes a sufficient exception under the law.

As for why the “to the extent” phrasing exists, consider an example: an employment contract consists of two clauses, A: that prevents the employee from disclosing confidential customer data to third parties, and B: a non-compete clause (which does come under the same provision mentioned by grandparent). If the employer ever sues an employee for violation of A, they shouldn’t be allowed to argue that they aren’t subject to it because of clause B.

Hey, that's a real argument, and it makes sense. Thank you for helping to clarify this topic.

Question: why would NVIDIA, makers of general intelligence, which seems to compete with everyone, publish code for software nobody can use without breaking NVIDIA rules? Wouldn't it be better for everyone if they just kept that code private?

This isn’t open source. It s the equivalent of headers being available to a dylib, just that they happen to be a python API.

Most of the magic is behind closed source components, and it’s posted with a fairly restrictive license.

In a similar fashion, you'll see that JAX has frontend code being open-sourced, while device-related code is distributed as binaries. For example, if you're on Google's TPU, you'll see libtpu.so, and on macOS, you'll see pjrt_plugin_metal_1.x.dylib.

The main optimizations (scheduler, vectorizer, etc.) are hidden behind these shared libraries. If open-sourced, they might reveal hints about proprietary algorithms and provide clues to various hardware components, which could potentially be exploited.

there has been a rise of "open"-access and freeware software/services in this space. see hugging face and certain models tied to accounts that accept some eula before downloading model weights, or weird wrapper code by ai library creators which makes it harder to run offline (ultralytics library comes to mind for instance).

i like the value they bring, but the trend is against the existing paradigm of how python ecosystem used to be.

I was playing around with taichi a little bit for a project. Taichi lives in a similar space, but has more than an NVIDIA backend. But its development has stalled, so I’m considering switching to warp now.

It’s quite frustrating that there’s seemingly no long-lived framework that allows me to write simple numba-like kernels and try them out in NVIDIA GPUs and Apple GPUs. Even with taichi, the Metal backend was definitely B-tier or lower: Not offering 64 bit ints, and randomly crashing/not compiling stuff.

Here’s hoping that we’ll solve the GPU programming space in the next couple years, but after ~15 years or so of waiting, I’m no longer holding my breath.

https://github.com/taichi-dev/taichi

I’ve been in love with Taichi for about a year now. Where’s the news source on development being stalled? It seemed like things were moving along at pace last summer and fall at least if I recall correctly.

Ha, interesting timing, last post 6Hr ago. Sounds like they are dogfooding it at least which is good. And I would agree with the assessment that 1.x is fairly feature complete, at least from my experience using it (scientific computing). And good to hear that they are planning on pushing support patches for eg python 3.12 cuda 12 etc

Yeah this discussion is pretty interesting. I was wondering what what happening with development as well. Taichi is cool tech, that if I had to be honest, it seems like they lacked direction of how to monetize it. For example they tried doing this "Taitopia thing" https://taitopia.design/ (which is already EOL'ed).

IMO if they had focused from the beginning on ML similar to Mojo, they would be in a better place.

> Here’s hoping that we’ll solve the GPU programming space in the next couple years, but after ~15 years or so of waiting, I’m no longer holding my breath.

It feels like the ball is entirely in Apple's court. Well-designed and Open Source GPGPU libraries exist, even ones that Apple has supported in the past. Nvidia supports many of them, either through CUDA or as a native driver.

the problem with the GPGPU space is that everything except CUDA is so fractally broken that everything eventually converges to the NVIDIA stuff that actually works.

yes, the heterogeneous compute frameworks are largely broken, except for OneAPI, which does work, but only on CUDA. SPIR-V, works best on CUDA. OpenCL: works best on CUDA.

Even once you get past the topline "does it even attempt to support that", you'll find that AMD's runtimes are broken too. Their OpenCL runtime is buggy and has a bunch of paper features which don't work, and a bunch of AMD-specific behavior and bugs that aren't spec-compliant. So basically you have to have an AMD-specific codepath anyway to handle the bugs. Same for SPIR-V: the biggest thing they have working against them is that AMD's Vulkan Compute support is incomplete and buggy too.

https://devtalk.blender.org/t/was-gpu-support-just-outright-...

https://render.otoy.com/forum/viewtopic.php?f=7&t=75411 ("As of right now, the Vulkan drivers on AMD and Intel are not mature enough to compile (much less ship) Octane for Vulkan")

If you are going to all that effort anyway, why are you (a) targeting AMD at all, and (b) why don't you just use CUDA in the first place? So everyone writes more CUDA and nothing gets done. Cue some new whippersnapper who thinks they're gonna cure all AMD's software problems in a month, they bash into the brick wall, write blog post, becomes angry forums commenter, rinse and repeat.

And now you have another abandoned cross-platform project that basically only ever supported NVIDIA anyway.

Intel, bless their heart, is actually trying and their stuff largely does just work, supposedly, although I'm trying to get their linux runtime up and running on a Serpent Canyon NUC with A770m and am having a hell of a time. But supposedly it does work especially on windows (and I may just have to knuckle under and use windows, or put a pcie card in a server pc). But they just don't have the marketshare to make it stick.

AMD is stuck in this perpetual cycle of expecting anyone else but themselves to write the software, and then not even providing enough infrastructure to get people to the starting line, and then surprised-pikachu nothing works, and surprise-pikachu they never get any adoption. Why has nvidia done this!?!? /s

The other big exception is Metal, which both works and has an actual userbase. The reason they have Metal support for cycles and octane is because they contribute the code, that's really what needs to happen (and I think what Intel is doing - there's just a lot of work to come from zero). But of course Metal is apple-only, so really ideally you would have a layer that goes over the top...

> yes, the heterogeneous compute frameworks are largely broken, except for OneAPI, which does work, but only on CUDA.

> Intel, bless their heart, is actually trying and their stuff largely does just work, supposedly, .... But they just don't have the marketshare to make it stick.

Isn't OneAPI basically SYCL? And there are different SYCL runtimes that run on Intel, Cuda, and Rocm? So what's lost if you use oneAPI instead of Cuda, and run it on Nvidia GPUs? In an ideal world that should work about the same as Cuda, but can also be run on other hardware in the future, but since the world rarely is ideal I would welcome any insight into this. Is writing oneAPI code mainly for use on Nvidia a bad idea? Or how about AdaptiveCpp (previously hipSYCL/Open SYCL)?

I've been meaning to try some GPGPU programming again (I did some OpenCL 10+ years ago), but the landscape is pretty confusing and I agree that it's very tempting to just pick Cuda and get started with something that works, but if at all possible I would prefer something that is open and not locked to one hardware vendor.

I really wish python would stop being the go-to language for GPU orchestration or machine learning, having worked with it again recently for some proof of concepts its been a massive pain in the ass.

> Warp is designed for spatial computing

What does this mean? I've mainly heard the term "spatial computing" in the context of the Vision Pro release. It doesn't seem like this was intended for AR/VR

> What's Taichi's take on NVIDIA's Warp?

> Overall the biggest distinction as of now is that Taichi operates at a slightly higher level. E.g. implict loop parallelization, high level spatial data structures, direct interops with torch, etc.

> We are trying to implement support for lower level programming styles to accommodate such things as native intrinsics, but we do think of those as more advanced optimization techniques, and at the same time we strive for easier entry and usage for beginners or people not so used to CUDA's programming model

– https://github.com/taichi-dev/taichi/discussions/8184

doing business is hard this days. you can't be just a rich a*hole while working with people. have to care about your image. and this is one of the ways of doing it. hanging out free stuff. sort of selfless donations. but in fact this rises the bar and makes competitors' life much harder. of course you can be rich and narrow minded, like intel. but then it's hard to attract external developers and make them believe in you future. nvidia's stock rise is based on the vision, investors believe in it. while other giants are being dominated by carrier managers. who know the procedures, but absolutely blind when it comes to technology evaluation. if someone comes to them with a great idea they first evaluate how it fits in their plans. sometimes they their own primitive vision, like in facebook. which proved to be a... not that good. so, all this sort of managers can do is look at what is _alrady_ successful and try to replicate it throwing a lot of money. it may be not enough. like intel still lags behind in GPUs.

Thats the part I don't get. When you are developing AI, how much code are you really running on GPUs? How bad would it be to write it for something else if you could get 10% more compute per dollar?

Those 10% are going to matter when you have an established business case and you can start optimizing. The AI space is not like that at all, nobody cares about losing money 10% faster. You can not risk a 100% slowdown running into issues with an exotic platform for a 10% speedup.

I think you answered your own question there. Python is very accessible, very popular, and already widely used for GPU-based things like machine learning.

Huge ecosystem starting with numpy, pandas, mathplot et al for data science, pytorch, tensorflow, jax for ML, gradio, rerun for visualization, opencv, open3d for image/pointcloud processing, pyside for gui and others.

to me it goes beyond that. many leetcode grinders swear by specific data structures such as hashmaps, which python makes available as dictionaries.

behind the sytax, there is plenty of heavy lifting for writing sophisticated code, when need be. that surely helps with the network effect.

This should be seen in light of the Great Differentiable Convergence™:

NERFs backpropagating pixels colors into the volume, but also semantic information from the image label, embedded from an LLM reading a multimedia document.

Or something like this. Anyway, wanna buy an NVIDIA GPU ;)?

While this is really cool, I have to say..

> import warp as wp

Can we please not copy this convention over from numpy? In the example script, you use 17 characters to write this just to save 18 characters later on in the script. Just import the warp commands you use, or if you really want "import warp", but don't rename imported libraries, please.

Strongly agreed! This convention has even infected internal tooling at my company. Scripts end up with tons of cryptic three letter names. It saves a couple keystrokes but wastes engineering time to maintain

The convention is a convention because the libraries are used so commonly. If you give anyone in scientific computing Python world something with “np” or “pd” then they know what that is. Doing something other than what is convention for those libraries wastes more time when people jump into a file because people have to work out now whether “array” is some bespoke type or the NumPy one they’re used to.

There is no way that "warp" is already such a household name that it's common enough to shorten it to "wp". Likewise, the libraries at OP's company are for sure not going to be common to anyone starting out at the company, and might still be confusing to anyone who has worked there for years but just hasn't had to use that specific library.

Pandas and Numpy are popular, sure. As is Tensorflow (often shortened to tf). But where do you draw the line, then? should the openai library be imported as oa? should flask be imported as fk? should requests be imported as rq?

It seems to happen mostly to libraries that are commonly used by one specific audience: scientists who are forced to use a programming language, and who think that 1-letter variables are good variable names, and who prefer using notebooks over scripts with functions.

Don't get me wrong, I'm glad that Python gets so much attention from the scientific community, but I feel that small little annoyances like this creep in because of it, too.

The more Python I write, the more I feel that "from foo import name1,name2,nameN" is The Way. Yes it's more verbose. Yes it loses any benefits of namespaces. However it encourages you to focus on your actual problem rather than hypothetical problems you might have someday, and the namespace clashes might have a positive unintended consequence of making you realize you don't actually need that other library after all.

This math is not adding up for me...isn't import warp necessary? So you only 6 more characters to write as wp. And anyway, to me savings in cognitive load later when I'm in the flow of coding is worth it.

OpenCL has been obsolete for years, as Intel, AMD and Google never provided a proper development experience with good drivers.

The fact that OpenCL 3.0 is basically OpenCL 1.0 rebranded, as acknwoledgement of OpenCL 2.0 adoption failure, doesn't help either.

>GPU support requires a CUDA-capable NVIDIA GPU and driver (minimum GeForce GTX 9xx).

Very tactful from nvidia. I have a lovely AMD gpu and this library is worthless for it.

（评论） (comments)

（评论）
(comments)