Someone posted a blog here a little while ago. I wrote up a big response only to find that OP deleted the post. I figured I might as well post my response here since it took me 45m to type out of my phone 🫠
What an interesting list. Some of these suggestions are good with others are not. I think we can reorder things a bit and make this more reasonable.
~~Jenkins~~
Jenkins is terrible! It should have been killed off a decade ago. Seriously, just don’t use Jenkins. There are much better offerings now.
Source control and CI/CD
The current trend is to rely on your source control provider for ci/cd. You may or may not have a choice in this space so let’s name some big ones. GitHub, Gitlab, Azure DevOps, Bitbucket, Gitea/Forgejo. They all act as a git server and all offer automation. Learn whichever your company uses. If you get to choose… GitHub is great! Gitlab is also good but the automations will be focused on bash and tend to get messy IMO. ADO is truly a Microsoft product with many nonsensical choices. I find it frustrating to use. I haven’t done ci/cd with bitbucket. If you want a foss option, check out forgejo (a fork of gitea). I have not used either yet though it looks nice and I really want to.
Containers
Docker is a fine choice. I really like some alternatives tools like buildah, podman, etc. but nearly every piece of documentation out there is based on docker. The choice is yours here but docker will probably give you the simplest experience.
Kubernetes is an amazing runtime environment! IMO should be used as a standard interface for running resources in a public cloud. However, this is a huge jump and you’ll want to learn at least a dozen good tools here. This one is a many years long practice but absolutely worthwhile. A quick and very incomplete list of tools: k9s, k3d, helm, kustomize (better than helm in most cases), flux, Argo (better than flux), istio. Seriously these are just the basics.
Infrastructure Management
While ansible is good, I would be looking to retire it at this point. A big possible exception is if you are running your own hardware and don’t have a great interface for alternative tools. If somebody just gives you a VM to use, then ya use ansible.
Terraform is great but don’t use it. OpenTofu is a foss fork and people should honestly just use this instead. But both tools have some limitations and oddities. People seem to love using terragrunt as well to make this easier to use.
If you’re using k8s, there’s also the open tofu controller. I’ve haven’t personally used it, but people I 100% trust in this space absolutely love it.
Observability
Firstly I like the numeronym instead: o11y.
Don’t use nagios. It’s old and there are better alternatives.
Elasticsearch is ok but I don’t really like it. Everything is stored as a document and just… eh, there are better options.
Prometheus is quite good.
Here’s the biggest mistake that people make today. Use OpenTelemetry as the core of your o11y solution. It’s the 2nd biggest CNCF project (right behind k8s) and it’s a fantastic tool. It lets you collect telemetry data and build data pipelines to whatever storage devices you want. That includes Prometheus and elasticsearch but you also can choose many more options as well with only tiny configuration changes.
ChatGPT
This entire post looks 100% like a copy/paste from ChatGPT. AI is a cool tool but OP, you should learn to use it a little better. Tell it to not use so much fluff text or such a rigid structure. Make edits afterwards. And most important of all, make sure it’s actually providing good info.
Oh I could easily be wrong about forgo having integrated ci/cd already. It’s the only tool I mentioned shove that I have never used before. I’m not a good source on this one.
But I have used both flux and argo quite a lot. I’ll admit that it flux implementation was bad, but it was just a bad experience for everyone using it with me. It was a memory hog and often created. Very few people understood how to use it correctly. When there were errors with e.g. a helm template, you just had to go looking for issues and read through the log. It moved git tags around so you don’t get a history of what flux was doing. I could probably remember more issues if I tried.
But none of that was a problem with Argo. We just started using it successfully on day 1. Plus its UI is fantastic and a huge advantage. It’s easy to navigate, spot issues, troubleshoot, etc. It also exposes users to resources they unknowingly create because Argo displays owned resources. This part really helped people understand what was going on in k8s. Oh and argo is very extensible. Maybe flux is too but I haven’t tried.