Thankfully lots of tools do most of the heavy lifting. I use k8s, GKE does most of the work for me. It's very nice to have autoscaling for traffic spikes. Same with database (MongoDB Altas), dead simple autoscaling. I would never run my own k8s nor database.
I think everyone's milage will very, but as general principles, staging is nice, reading docs saves time overall, tests help you sleep at night and make it easier to make changes 6 months in the future, simple health checks (or anything on a critical path) help you catch the real issues that need immediate attention.
Thankfully lots of tools do most of the heavy lifting. I use k8s, GKE does most of the work for me. It's very nice to have autoscaling for traffic spikes. Same with database (MongoDB Altas), dead simple autoscaling. I would never run my own k8s nor database.
I wrote more details about some of the Ops stuff I do here in a previous similar question: https://www.hackerneue.com/item?id=26204402
Coincidentally I also wrote some architecture notes about a new product last night: https://blog.c0nrad.io/posts/slack-latex/
I think everyone's milage will very, but as general principles, staging is nice, reading docs saves time overall, tests help you sleep at night and make it easier to make changes 6 months in the future, simple health checks (or anything on a critical path) help you catch the real issues that need immediate attention.
Good luck!