All software engineers develop their own taste and style, and naturally people form differing opinions. I decided to write down some of my personal pet peeves so that I have a link to point to when these things come up in discussion/review!
I like to avoid being too opinionated online, but hopefully most of my points in this post aren’t actually very controversial!
When writing code that users are expected to interact with in some way the primary design consideration should be the usability of the interface.
Breaking this down, this statement actually applies to nearly all software - whether it’s a command-line interface, a graphical/web user interface, or an internal API where the ‘user’ is just another unit of code.
One of the key factors in usability across software interfaces is naming (also one of the hardest problems in software!). Another hugely important factor is consistency, and this will often override the naming point (you might think of this in terms of “practicality over purity”).
There are some basics that CLI tools should support. However, it’s worth considering how/where the tool is expected to be invoked.
If there’s a bash script that’s only intended to be run from another bash script then that’s more of a programmatic API than a CLI tool. If there’s a script intended to be run by internal project developers then it’s less important to get right than a script that’s being provided to end-users.
Points to consider (in a vague order of importance):
--helpshould output the tool’s usage
stderr(and/or a file).
Design the APIs first, build the code around them. Making an API decision based on an implementation consideration should be avoided unless it’s another case of practicality over purity (e.g. the implementation would be too much effort or too slow unless the API is simplified).
An API should be an abstraction that makes sense to a user of the API, and it’s important to think about what context that user is expected to have (even if the ‘user’ is just another internal module of your own project). Avoid confusing naming abbreviations on API boundaries. Think about how users of the API might want to use it, and make this as seamless as possible, even if it makes the implementation more complicated.
Nobody wants ugly, messy code. Certainly nobody wants to work with someone else’s mess. So how do we avoid the mess and keep things tidy?
The most important thing with code formatting (aside from basics such as correct indentation) is that it’s consistent within the project. Of course, the problem with this is that anyone contributing to the code for the first time must first learn the formatting conventions to maintain consistency.
The solution to this, which seems to be rapidly growing in popularity and becoming standard, is to use a code auto-formatter.
Many modern languages come with their own auto-formatters, including Rust, Go, Elm, Zig.
The Python ecosystem has also more-or-less standardised on the
Auto-formatters are not perfect - it’s impossible for them to be because they don’t understand the code. However, in my opinion the benefit of having standardised formatting across projects hugely outweighs the small downsides (the formatting might not be exactly what you’re used to, and might require turning off in certain rare cases).
Use an auto-formatter if you’re using a language that has a standard popular choice!
There’s an age-old tabs versus spaces argument that I’m not going to get into (obviously spaces), as well as a line endings argument (obviously
More importantly (wink), I want to make the simple point that all code/text files should end in a newline.
cat file1 file2” - if
file1doesn’t end in newline then the last line of
file1is displayed on the same line as the first line of
open(file1).readlines()” - the last line would be the only one without a newline, which might be surprising
"foo\n", but this wouldn’t ever match the last line
Static analysis can be great for a few reasons:
Most of my experience of static checkers is with Python-based tools, primarily
pylint (linting) and
mypy (type checking).
I would expect the points here to stand with any language, but it’s quite possible my perspective has been skewed in my limited exposure. Static checkers are effectively there to fill in for a limitation in the checks provided by the compiler, so I’d expect modern languages with a strict compiler to have less need for separate static analysis (e.g. Rust, Elm, Zig).
The big problem I see with static analysis tools is the number of false positives they highlight. Sure, there are configuration files that allow you to disable checks you’re not interested in, but some of the issues highlighted are due to bugs or lack of cleverness in the tool. This problem seems to be exacerbated by the fact the tools are forever fighting to keep up with core language changes.
The standard solution, aside from editing the tool’s config file, is to add a comment in the code to disable the tool on a specific line/block.
For example, you can disable all linting warnings on a line by appending “
# noqa” - that’s “no quality assurance”, although I’m sure that was obvious (personally I read this as “no questions asked”)…
This is a horrible solution. Why are we adding clutter to code for the sake of running a tool that has bugs and/or doesn’t have the required functionality to do its job properly? The main argument I can see in favour of this is the “practicality over purity” one again - and don’t get me wrong, I have used comments to disable static analysis warnings. It just makes me very sad when I have to, and it should be avoided as much as possible. Isn’t it nice when the language’s compiler already has all the checks you need…
This section is mostly a rant about static analysis tools, but the main take-away is: please don’t clutter your code with comments disabling tools, I don’t want to be distracted by that when trying to understand what the code is doing!
Never change the code for the sake of making testing easier unless it serves as an improvement to the code.
An example of an improvement driven by testing would be something like moving towards a ‘clean architecture’, where I/O is handled near the “edge” of the program. Another example of a positive change would be simplifying the code’s functionality, removing unnecessary features, or being more restrictive in other ways. Simpler code is easier to reason about and easier to test, and that’s a good thing.
An example of a bad change would be making an attribute/function part of a public API just so that it can be accessed by the tests. This should be avoided as much as possible! In Python it’s not really a problem because nothing is truly private, but I’m aware languages like C and Java make things very awkward. In general it’s better to try and test at API boundaries anyway, but practically this isn’t always the most effective way to write UT.
If you have a set of tests, that’s great, but it’s not much use if they don’t get run.
Ideally tests should be run automatically in a CI system before commit (or at least the majority, perhaps excluding some slower IT tests).
There should also be clear, simple instructions alongside the tests about how to run them manually!
The requirement for documenting tests is reduced if it’s made clear that a popular framework should be used, such as
The majority of possible regressions should be caught in UT (unit tests), and that UT should be fast. There is no excuse for slow UT - mock out any external effects that might make it slow (database access, timers, running commands, …). Make sure you have some IT (integration tests) as well, which does not have to have the same performance considerations.
Classes represent types of objects, and methods are actions that the object can perform. Objects are nouns and functions/methods are verbs. Hopefully I’m stating the obvious!
I find that using classes to model objects can be helpful in modelling problems - that is, I like OOP. However, in some cases there’s just no need for thinking in these terms, and no need for implementing logic inside a class. A plain and simple function is often sufficient (unless handcuffed by Java’s over-the-top OOP).
When defining a class, the name should correspond to the type of object you’re modelling. It should be possible to imagine “one of those objects” (i.e. an instance), otherwise the abstraction you’re building is unlikely to be very helpful. If all you really want is to group some code together then using a class may be the wrong solution.
If a class forms part of an API, think about whether it makes sense for the user to work with the type of object you’re defining. Would it be simpler and easier for the user to simply invoke a function? A function can still make use of a class under the covers - perhaps it would wrap the class by performing some setup before creating and returning the class instance.
It’s fine to use variables just to give a name to something, even if they’re only used once. Variables are cheap! If introducing a single-use variable removes the need for a comment then that’s a win.
Avoid putting too much on one line - I’d generally recommend restricting to one “piece of logic”. I also find it makes code clearer to avoid too much logic in a single expression. That is, breaking over multiple lines can make things clearer, but the main point is to break up logically separate tasks over multiple blocks/expressions.
Similarly to the points above: break logic up into smaller functions, even if only used once. If things need more grouping then split into classes/modules, or just use comment headings to break up a module.
If coming up with a clear name for a variable/function/class/module is difficult then it may be a sign that the purpose is unclear, and the structure should be reassessed. Accepting that the names should be somewhat long may be the right answer. Otherwise, perhaps consider putting the logic inside a namespace (module/class) such that the context is clearer - the names should then be able to focus on the specifics within that context without needing to set the scene.
I’ve probably forgotten to mention a few things that I’ll soon be reminded of. I’ll probably just keep some notes of things that come up and consider writing a separate update post at some point in the future.
Feel free to tweet me with your thoughts and/or your own similar frustrations!