My choice tools for Cloud Engineering
I'm an engineer, which means I love my tools and there's some tools I hate. I guess that comes with the territory. This is the collection of tools I try to use anytime I can for Cloud Engineering tasks.
Ill format this by category, and list out my likes and dislikes about the tool. Realize that each one of these is just my opinion and I'm not saying they're the only tool anyone should ever use, people have different preferences and needs afterall. Enjoy!
Infrastructure as Code: Terraform
Terraform is a tool that allows you to define your cloud infrastructure as code. It is intended to be declarative rather than programmatic, but it does a good job of balancing between the two.
Things I like | Things I don't like |
|
|
Admittedly I am a bit biased because the only other tool in this space I've used is CloudFormation, which works well for small use cases but quickly becomes unmanageable when you have a fast changing dynamic environment like the cloud.
In my mind, the 3 big players for IaC are: CloudFormation, Terraform and Pulumi. The way I like to think about it is a spectrum where CloudFormation being the most "declarative" while Pulumi is almost literally just a library in a programming language. Terraform leans towards being declarative but has much better support for things like conditionals, loops, variables, passing data between modules and other programming language paradigms.
Configuration Management: Ansible
Ansible allows one to codify the steps to provisioning a virtual machine or a Docker image, allowing one to reproduce their configuration whenever needed.
Things I like | Things I don't like |
|
|
Honestly I feel like I could keep going on and on with this list of pro's.
Personally I've used the task profiling to setup a playbook with parallelization which gave us a dramatic speed up in execution time. Using those 2 features we went from 3 hour runs down to ~25 minutes.
Competitors I've used are Fabric (Python lib) and SaltStack. I've dabbled a bit with Chef but am very unfamiliar with it. Fabric is no longer maintained because of the Python 3 switch, but I do like the concept of managing state with the power of a programming language, but also fear that it can become very unruly very quick if you maintain poor abstractions. I like Ansible because it puts guard rails on those abstractions so things really can't get too complex or hard to understand.
For what Ansible is, it does a great job and is one of the more reliable tools I regularly use at work.
Version Control: Github / Git
Github is a remote code repository and developer platform. It is obviously backed by the company Github itself. It is by far the most popular code repository of it's kind, built by coder for coders.
Things I like | Things I don't like |
|
|
I have only really used TortiseSVN as an alternative, and it's workflow always felt clunky to me. I was using it right when I started learning programming so perhaps it's not as bad as I remember.
Git is so popular and ingrained in my head that I honestly don't know how I would switch away unless I joined a company that ran another one.
I should also mention that Gitlab is also pretty nice in comparison to Github, though I still prefer Github probably just out of familiarity alone. Both do seem to offer the same things.
Programming Language: Python
Okay before you bite my head off saying its "too slow", remember this post is about the role of a Cloud Engineer, not a HPC or ML engineer. Python has dozens of useful libraries and has a long history of being used as a systems language.
Things I like | Things I don't like |
|
|
The obvious elephant in the room is Bash. I have a love - hate relationship with Bash, but also feel very inadequate because I've been using daily it for over 10 years and still have to Google basic syntax every time I use it.
The way I think about these 2 "languages" (if you can even call bash a language) is that Bash is great for simple things that you run on your command line, but the second you need more than 1 if statement, you should seriously consider using Python instead. Also bash sucks at handling JSON output and Python defiantly shines there.
I know that Hashicorp was trying to make Ruby the Cloud Engineer language of choice but just doesn't seem to have taken off the same way as Python has.
Just for fun, here's some of my favorite Python libraries:
Monitoring: Prometheus + Grafana
I've already written about this subject when discussing monitoring for my Team Fortress servers but ill summarize here.
Things I like | Things I don't like |
|
|
Code Editor: VS Code
VS Code is a code editor created and maintained by Microsoft. It's relatively lightweight and can become very fully featured via their extensions.
Normally I don't like throwing my hat in the ring for things as subjective as a code editor. Everyone has strong opinions about this, however VS Code has a few things I believe seperate it from other editors in it's class.
Things I like | Things I don't like |
|
|
I've used so many text editors over the years: Brackets, Atom, Notepad++, Sublime, Bluefish, GGTS, Eclipse, Visual Studio, and even just VIM. To be honest, until I started using VS Code, mostly I chose an editor based on if I could find a syntax color theme I liked.
I will say the #1 thing I love about VS Code is the Remote Developer Pack. Instead of running an SFTP plugin or having just a terminal in another screen, this is a truely seamless experience. You get all the same code searching and features normally only available locally with it. I've been using it to connect to remote servers as well as with Windows WSL.
VS Code is not as oriented to a specific task as an IDE like Eclipse or Visual Studio. This is actually nice for me since I work on a lot of different tasks and like that I can tailor VS code to my needs, instead of trying to shoe horn stuff into other IDE's.
Cloud Provider: Too difficult to say
Haha didn't expect that one did you?
I feel like this is very dependent on the task. Obviously I like any provider that has integration with other tools I mentioned. So ill just talk briefly about the one's I've used.
AWS
The biggest player in this space and by far the most features and services, most of which I don't generally use. AWS's core things work very well, but one thing I don't like about AWS is how a lot of the newer tools feel kind of thrown together and often have a lot of problems actually using them. As with their UI, the experience of the developer feels like an afterthought a lot of the time.AWS is the only one I've used that has a robust virtual networking offering which is huge for any company trying to do basic network security. And on the subject of security; IAM permissions are a horrendous pain to use, especially if you're not familiar with all the in's and outs of how it works; though no other provider does it any better.
AWS is also by far the most expensive one here. For hobby projects, it's a bit costly unless you really do a good job with cost management. AWS also makes it easy to spend way more than you expect and you need dedicated engineers to pay attention to this kind of thing.
For business uses, I think AWS is the clear choice.
Google Cloud
I think the biggest thing GCP has going for it is that it's UI feels "designed" and like it's not programmatically generated.I had used GCP for a while with my Team Fortress 2 servers, but ultimately ended up moving to AWS because of the lack of features for VM management and their API wasn't quite as simple to integrate with as AWS (GCP has no equivalent to boto3)
GCP's user permissions are also way more byzantine than I expected, and is somehow more confusing to me than AWS IAM. I can't believe I'm going to say this sentence: I'd rather use AWS IAM over GCP's permissions.
They offer a similar suite of features to that of AWS, and is a serious contender for business use cases.
On the subject of cost, GCP has way better cost estimation tools and they tell you the price before you stand something up in their UI; something AWS just simply does not do. Though that said, most of the time I'm standing up infrastructure programmatically so I don't really see this kind of thing often.
Vultr
I like Vultr. It's simple. They have a lot of good locations in the US, which is great for me as someone who hosts game servers.They obviously don't have nearly as many features as AWS or GCP, but they knock their VM stuff out of the park. Infact you can literally upload your own .iso to it and just run that; opening up the possibility of doing local builds and just sharing an .iso. They also have Windows VM's if you need that kind of thing.
They have recently added an S3-compatible "spaces" offering which I'm using to good effect for TF2maps static sites. It's not fully flushed out though, so I'll hold my judgement on it for now.
It has everything you need for small projects. Built-in DNS, VM's, Block Storage, Firewalls, Load Balancers. It's networking is lacking a bit in my opinion.
3 features I want from Vultr:
- An AWS Fargate type docker runtime. Basically I just give you a container and you run it
- Lambda's
- Managed databases
A few years ago I had a lot of networking issues hosting game servers, which has turned me off from using them. It appears that this is no longer a problem given we're running almost entirely on Vultr at TF2Maps.
DigitalOcean
Very similar to Vultr, and similarly I do like DigitalOcean for the same reasons. Their networking is more mature and they do have a database offering as well. They recently launched an Apps feature, which as far as I can tell is basically just Heroku.
If DigitalOcean added the features i mentioned in the Vultr section, I'd happily switch to them. They have the managed DB part down.
Honorable Mentions
- Packer - A simple program for doing Golden image builds. Very modular and simple
- RKT - A Docker-compatible runtime that has a lot more security features vs Docker itself.
- pgcli, mycli, litecli - Database CLI's that use Prompt-Toolkit. Nice features like syntax highlighting, and context aware tab completion
- tmux - Great for running some services in the background without losing access to the command line. Also great for doing multiple tasks on a VM at the same time
- jq - An attempt to make JSON parsing work in bash without crazy awk, sed and grep pipelines.
- Figma - A free UI prototyping tool. Mostly I've used this as a way to show a more tangible face to some of the backend things I want to do at TF2Maps.
- LucidCharts - Great for making infrastructure diagrams
- CloudTracker - A great tool for creating least priveledge IAM permissions
These are just tools
It's important to remember that just like a hammer is a tool, all of these are just tools for specific tasks. I'm not married to any one of these tools and will switch if another tool comes along that solves my problem better.
To me it's more important to follow principals of staying organized than to become obsessed with how a certain tool solves a problem. Almost all of these tools I like because they allow me to maintain that kind of simplicity and organization that I want.
One final thought; remember that everyone's use case is different and sometimes a tool is still a better choice even when it has less exciting features because of things like cost and whether engineers at your organization are already familiar with it. There is a cost to onboarding new tools and learning their paradigms.