![]() |
|
![]() |
| There’s an army of developers working on Linux as well, employed by companies like IBM and Oracle. I don’t see a huge difference to Microsoft here to be honest. |
![]() |
| This might all be true, but has this actually resulted in better software for end users? More stability, faster delivery of useful features? That is my concern. |
![]() |
| Being able to create a portable artifact with only the userspace components in it, and that can be shipped and run anywhere with minimal fuss is something that didn't really exist before containers. |
![]() |
| Tried that. The devs revolted and said the whole point of containers was to escape the tyranny of ops. Management sided with them, so it's the wild west there. |
![]() |
| On top of that, it's either the OCI spec that's broken or it's just AWS being nuts, but unlike GitLab and Nexus, AWS ECR doesn't support automatically creating folders (e.g. ".dkr.ecr..amazonaws.com/foo/bar/baz:tag"), it can only do flat storage and either have seriously long image names or tags.
Yes you can theoretically create a repository object in ECR in Terraform to mimic that behavior, but it sucks in pipelines where the result image path is dynamic - you need to give more privileges to the IAM role of the CI pipeline than I'm comfortable with, not to mention that I don't like any AWS resources managed outside of the central Terraform repository. [1] https://stackoverflow.com/questions/64232268/storing-images-... |
![]() |
| IIRC it's not in the spec because administration of resources is out of scope. For example, perhaps you offer a public repository and you want folks to sign up for an account before they can push? Or you want to have an approval process before new repositories are created?
Regardless it's a huge pain that ECR doesn't support this. Everybody I know of who has used ECR has run into this. There's a long standing issue open which I've been subscribed to for years now: https://github.com/aws/containers-roadmap/issues/853 |
![]() |
| Looks cool. Thanks for linking it.
It does mention that it's limited to 500MB per layer. For some people's use case that limitation might not be a big deal, but for others that's a dealbreaker. |
![]() |
| Source: I have implemented a OCI-compliant registry [1], though for the most part I've been following the behavior of the reference implementation [2] rather than the spec, on account of its convolutedness.
When the client finalizes a blob upload, they need to supply the digest of the full blob. This requirement evidently serves to enable the server side to validate the integrity of the supplied bytes. If the server only started checking the digest as part of the finalize HTTP request, it would have to read back all the blob contents that had already been written into storage in previous HTTP requests. For large layers, this can introduce an unreasonable delay. (Because of specific client requirements, I have verified my implementation to work with blobs as large as 150 GiB.) Instead, my implementation runs the digest computation throughout the entire sequence of requests. As blob data is taken in chunk by chunk, it is simultaneously streamed into the digest computation and into blob storage. Between each request, the state of the digest computation is serialized in the upload URL that is passed back to the client in the Location header. This is roughly the part where it happens in my code: https://github.com/sapcc/keppel/blob/7e43d1f6e77ca72f0020645... I believe that this is the same approach that the reference implementation uses. Because digest computation can only work sequentially, therefore the upload has to proceed sequentially. [1] https://github.com/sapcc/keppel [2] https://github.com/distribution/distribution |
![]() |
| Layers are fully independent of each other in the OCI spec (which makes them reusable). They are wired together through a separate manifest file that lists the layers of a specific image.
It's a mystery... Here are the bits of the OCI spec about multipart pushes (https://github.com/opencontainers/distribution-spec/blob/58d...). In short, you can only upload the next chunk after the previous one finishes, because you need to use information from the response's headers. |
![]() |
| Thanks, that helps a lot and I didn't know about it:) It's a touch less powerful than full transactions (because AFAICT you can't say merge a COPY and RUN together) but it's a big improvement. |
![]() |
| That's a pretty cool use case!
Personally, I just use Nexus because it works well enough (and supports everything from OCI images to apt packages and stuff like a custom Maven, NuGet, npm repo etc.), however the configuration and resource usage both are a bit annoying, especially when it comes to cleanup policies: https://www.sonatype.com/products/sonatype-nexus-repository That said: > More specifically, I logged the requests issued by docker pull and saw that they are “just” a bunch of HEAD and GET requests. this is immensely nice and I wish more tech out there made common sense decisions like this, just using what has worked for a long time and not overcomplicating. I am a bit surprised that there aren't more simple container repositories out there (especially with auth and cleanup support), since Nexus and Harbor are both a bit complex in practice. |
![]() |
| Private (cloud) registries are very useful when there are mandatory AuthN/AuthZ things in the project related to the docker images. You can terraform/bicep/pulumi everything per environment. |
![]() |
| To be clear, the 8x was comparing the slowest ECR throughput measurement against the fastest S3 one. In any case, the improvement is significant. |
![]() |
| That's true, but I'd assume the server would like to double-check that the hashes are valid (for robustness / consistency)... That's something my little experiment doesn't do, obviously. |
![]() |
| That's true, unfortunately. I'm thinking about ways to somehow support private repos without introducing a proxy in between... Not sure if it will be possible. |
![]() |
| This is such a wonderful idea, congrats.
There is a real usecase for this in some high security sectors. I can't put complete info here for the security reasons, let me know if you are interested. |
![]() |
| I didn't expect that! It's a pity they don't expose an API for parallel uploads, for those of us who need to maximize throughput and don't mind using something non-standard. |
![]() |
| Make sure you use HTTPS, or someone could theoretically inject malicious code into your container. If you want to use your own domain you'll have to use CloudFront to wrap S3 though. |
![]() |
| The source code is proprietary, but it shouldn't take much work to replicate, fortunately (you just need to upload files at the right paths). |
> According to the specification, a layer push must happen sequentially: even if you upload the layer in chunks, each chunk needs to finish uploading before you can move on to the next one.
As far as I've tested with DockerHub and GHCR, chunked upload is broken anyways, and clients upload each blob/layer as a whole. The spec also promotes `Content-Range` value formats that do not match the RFC7233 format.
(That said, there's parallelism on the level of blobs, just not per blob)
Another gripe of mine is that they missed the opportunity to standardize pagination of listing tags, because they accidentally deleted some text from the standard [1]. Now different registries roll their own.
[1] https://github.com/opencontainers/distribution-spec/issues/4...