My choice tools for Cloud Engineering

I'm an engineer, which means I love my tools and there's some tools I hate. I guess that comes with the territory. This is the collection of tools I try to use anytime I can for Cloud Engineering tasks.

Ill format this by category, and list out my likes and dislikes about the tool. Realize that each one of these is just my opinion and I'm not saying they're the only tool anyone should ever use, people have different preferences and needs afterall. Enjoy!


Infrastructure as Code: Terraform

Terraform is a tool that allows you to define your cloud infrastructure as code. It is intended to be declarative rather than programmatic, but it does a good job of balancing between the two. 

 

 

Things I likeThings I don't like
  • Deployment is part of the tool automatically. No need to write custom deploy or wrapper code
  • Support for variables, loops, conditionals and reusability
  • Module repository
  • Backend agnostic. Well suited for multi-cloud situations
  • Backed by Hashicorp but completely FOSS
  • Very popular which means lots of good Stack Overflow posts on gotchas
  • Cost Estimation
  • State management is literally a flat file that itself needs to be centrally managed
  • The looping syntax is quite confusing
  • Not always 100% up to date with it's AWS provider

Admittedly I am a bit biased because the only other tool in this space I've used is CloudFormation, which works well for small use cases but quickly becomes unmanageable when you have a fast changing dynamic environment like the cloud.

In my mind, the 3 big players for IaC are: CloudFormation, Terraform and Pulumi. The way I like to think about it is a spectrum where CloudFormation being the most "declarative" while Pulumi is almost literally just a library in a programming language. Terraform leans towards being declarative but has much better support for things like conditionals, loops, variables, passing data between modules and other programming language paradigms.

 


Configuration Management: Ansible

Ansible allows one to codify the steps to provisioning a virtual machine or a Docker image, allowing one to reproduce their configuration whenever needed.

 

 

Things I likeThings I don't like
  • Doesn't "guarantee" what is written will be reality
  • With roles, sometimes it's hard to follow the flow of execution
  • Some built-in modules (e.g. Docker) require system packages to be installed before the task can run. This is also poorly documented in most cases.

Honestly I feel like I could keep going on and on with this list of pro's.

Personally I've used the task profiling to setup a playbook with parallelization which gave us a dramatic speed up in execution time. Using those 2 features we went from 3 hour runs down to ~25 minutes.

Competitors I've used are Fabric (Python lib) and SaltStack. I've dabbled a bit with Chef but am very unfamiliar with it. Fabric is no longer maintained because of the Python 3 switch, but I do like the concept of managing state with the power of a programming language, but also fear that it can become very unruly very quick if you maintain poor abstractions. I like Ansible because it puts guard rails on those abstractions so things really can't get too complex or hard to understand.

For what Ansible is, it does a great job and is one of the more reliable tools I regularly use at work.


Version Control: Github / Git

Github is a remote code repository and developer platform. It is obviously backed by the company Github itself. It is by far the most popular code repository of it's kind, built by coder for coders.

 

 

Things I likeThings I don't like
  • UI makes the complexity of Git feel much less so
  • Pull request comment and approval workflow is really nice
  • Merging a PR can be gated by tests passing or needing approvals
  • Code Owners
  • Github pages is nice for making static docs for your project
  • Github Actions!
  • Private repo's are now free!
  • Git itself is very confusing if you stray away from the branch, add, commit, push workflow
  • Git has a bit of a steep learning curve and lot of jargon
  • New Git users often get into crazy Git states and have to be destructive to get out of it
  • Github's permissions are not granular enough for a medium to large organization

I have only really used TortiseSVN as an alternative, and it's workflow always felt clunky to me. I was using it right when I started learning programming so perhaps it's not as bad as I remember.

Git is so popular and ingrained in my head that I honestly don't know how I would switch away unless I joined a company that ran another one.

I should also mention that Gitlab is also pretty nice in comparison to Github, though I still prefer Github probably just out of familiarity alone. Both do seem to offer the same things.


Programming Language: Python

Okay before you bite my head off saying its "too slow", remember this post is about the role of a Cloud Engineer, not a HPC or ML engineer. Python has dozens of useful libraries and has a long history of being used as a systems language.

 

 

Things I likeThings I don't like
  • Boto3 library is great for writing glue code for AWS or as part of a Lambda
  • All Linux distros ship with Python by default
  • Most OS tools use Python and can be extended using it (e.g. Systemd)
  • Very helpful traceback's
  • PDB - Very easy to debug
  • 3rd most popular language
  • Syntax often reads like pseudo-code (most "sys admins" can barely code, this is a big plus)
  • Language features work well together. Decorators, comprehensions, iterators, dataclasses, dicts, just to name a few.
  • It's package management ecosystem was designed by Satan himself
  • Subprocessing and multiprocessing are not as easy as something like Bash.
  • Python 2 vs 3 switch. I still find old Python 2 scripts laying around
  • Arguments about the 80 char lines

The obvious elephant in the room is Bash. I have a love - hate relationship with Bash, but also feel very inadequate because I've been using daily it for over 10 years and still have to Google basic syntax every time I use it.

The way I think about these 2 "languages" (if you can even call bash a language) is that Bash is great for simple things that you run on your command line, but the second you need more than 1 if statement, you should seriously consider using Python instead. Also bash sucks at handling JSON output and Python defiantly shines there.

I know that Hashicorp was trying to make Ruby the Cloud Engineer language of choice but just doesn't seem to have taken off the same way as Python has.

Just for fun, here's some of my favorite Python libraries:


Monitoring: Prometheus + Grafana

I've already written about this subject when discussing monitoring for my Team Fortress servers but ill summarize here.

 

Things I likeThings I don't like
  • Very easy to get going out of the box
  • PromQL and Grafana explorer are very easy to use
  • Grafana dashboards are very easy and intuitive to create
  • Grafana supports tables and text nodes
  • Alerting with AlertManager is not the greatest

Code Editor: VS Code

VS Code is a code editor created and maintained by Microsoft. It's relatively lightweight and can become very fully featured via their extensions.

 

 

 

Normally I don't like throwing my hat in the ring for things as subjective as a code editor. Everyone has strong opinions about this, however VS Code has a few things I believe seperate it from other editors in it's class.

 

Things I likeThings I don't like
  • Very modular and robust extension ecosystem
  • Remote Developer Pack
  • Built-in Git features
  • Fast code searching
  • Built-in terminal tab's
  • Ctrl + P fast search menu
  • Saves your entire setup to a config file that can be shared on new machines
  • Offers suggestions for extensions if you open a file that is an unrecognized format
  • Command line integration via the code cli (A bit of a pretentious name for their cli)
  • Intellisense is very dumb unless you do a lot of configuration
  • Terminal can only be split vertically, sometimes I want to split horizontally

I've used so many text editors over the years: Brackets, Atom, Notepad++, Sublime, Bluefish, GGTS, Eclipse, Visual Studio, and even just VIM. To be honest, until I started using VS Code, mostly I chose an editor based on if I could find a syntax color theme I liked. 

I will say the #1 thing I love about VS Code is the Remote Developer Pack. Instead of running an SFTP plugin or having just a terminal in another screen, this is a truely seamless experience. You get all the same code searching and features normally only available locally with it. I've been using it to connect to remote servers as well as with Windows WSL

VS Code is not as oriented to a specific task as an IDE like Eclipse or Visual Studio. This is actually nice for me since I work on a lot of different tasks and like that I can tailor VS code to my needs, instead of trying to shoe horn stuff into other IDE's.


Cloud Provider: Too difficult to say

Haha didn't expect that one did you?

I feel like this is very dependent on the task. Obviously I like any provider that has integration with other tools I mentioned. So ill just talk briefly about the one's I've used.

AWS

The biggest player in this space and by far the most features and services, most of which I don't generally use. AWS's core things work very well, but one thing I don't like about AWS is how a lot of the newer tools feel kind of thrown together and often have a lot of problems actually using them. As with their UI, the experience of the developer feels like an afterthought a lot of the time. 

AWS is the only one I've used that has a robust virtual networking offering which is huge for any company trying to do basic network security. And on the subject of security; IAM permissions are a horrendous pain to use, especially if you're not familiar with all the in's and outs of how it works; though no other provider does it any better.

AWS is also by far the most expensive one here. For hobby projects, it's a bit costly unless you really do a good job with cost management. AWS also makes it easy to spend way more than you expect and you need dedicated engineers to pay attention to this kind of thing.

For business uses, I think AWS is the clear choice.

Google Cloud

I think the biggest thing GCP has going for it is that it's UI feels "designed" and like it's not programmatically generated. 

I had used GCP for a while with my Team Fortress 2 servers, but ultimately ended up moving to AWS because of the lack of features for VM management and their API wasn't quite as simple to integrate with as AWS (GCP has no equivalent to boto3)

GCP's user permissions are also way more byzantine than I expected, and is somehow more confusing to me than AWS IAM. I can't believe I'm going to say this sentence: I'd rather use AWS IAM over GCP's permissions.

They offer a similar suite of features to that of AWS, and is a serious contender for business use cases.

On the subject of cost, GCP has way better cost estimation tools and they tell you the price before you stand something up in their UI; something AWS just simply does not do. Though that said, most of the time I'm standing up infrastructure programmatically so I don't really see this kind of thing often.

Vultr

I like Vultr. It's simple. They have a lot of good locations in the US, which is great for me as someone who hosts game servers. 

They obviously don't have nearly as many features as AWS or GCP, but they knock their VM stuff out of the park. Infact you can literally upload your own .iso to it and just run that; opening up the possibility of doing local builds and just sharing an .iso. They also have Windows VM's if you need that kind of thing.

They have recently added an S3-compatible "spaces" offering which I'm using to good effect for TF2maps static sites. It's not fully flushed out though, so I'll hold my judgement on it for now.

It has everything you need for small projects. Built-in DNS, VM's, Block Storage, Firewalls, Load Balancers. It's networking is lacking a bit in my opinion. 

3 features I want from Vultr:

  • An AWS Fargate type docker runtime. Basically I just give you a container and you run it
  • Lambda's
  • Managed databases

A few years ago I had a lot of networking issues hosting game servers, which has turned me off from using them. It appears that this is no longer a problem given we're running almost entirely on Vultr at TF2Maps.

DigitalOcean

Very similar to Vultr, and similarly I do like DigitalOcean for the same reasons. Their networking is more mature and they do have a database offering as well. They recently launched an Apps feature, which as far as I can tell is basically just Heroku.

 

If DigitalOcean added the features i mentioned in the Vultr section, I'd happily switch to them. They have the managed DB part down.


Honorable Mentions

  • Packer - A simple program for doing Golden image builds. Very modular and simple
  • RKT - A Docker-compatible runtime that has a lot more security features vs Docker itself. 
  • pgcli, mycli, litecli - Database CLI's that use Prompt-Toolkit. Nice features like syntax highlighting, and context aware tab completion
  • tmux - Great for running some services in the background without losing access to the command line. Also great for doing multiple tasks on a VM at the same time
  • jq - An attempt to make JSON parsing work in bash without crazy awk, sed and grep pipelines.
  • Figma - A free UI prototyping tool. Mostly I've used this as a way to show a more tangible face to some of the backend things I want to do at TF2Maps.
  • LucidCharts - Great for making infrastructure diagrams
  • CloudTracker - A great tool for creating least priveledge IAM permissions

These are just tools

It's important to remember that just like a hammer is a tool, all of these are just tools for specific tasks. I'm not married to any one of these tools and will switch if another tool comes along that solves my problem better. 

To me it's more important to follow principals of staying organized than to become obsessed with how a certain tool solves a problem. Almost all of these tools I like because they allow me to maintain that kind of simplicity and organization that I want.

One final thought; remember that everyone's use case is different and sometimes a tool is still a better choice even when it has less exciting features because of things like cost and whether engineers at your organization are already familiar with it. There is a cost to onboarding new tools and learning their paradigms.