Comments - What you should keep in mind

Do we really need comments?#

Nowadays, everyone uses source code management systems (like GitHub for instance), especially when working with multiple developers on one project. If you don't use a code management system for your projects, consider migrating your project to use one. Even if you work alone on your project - it's still very useful.

With source code management systems, you can find every written line in the project. It doesn't matter if someone deleted it a long time ago, you can always look up every older version. So you don't need to comment-out code segments any more! Furthermore, when you have commented-out code in your project, it's very likely for it to remain for a long time. And someday you ask yourself: What is this code and why is it here? Maybe you don't even dare to delete it, because it could be from someone else and still relevant. In the end, it's always better to instantly delete unused code.

Okay, to comment-out code isn't a good idea. But it's still very useful to describe the code with comments, so other ones can understand it better, right? No, not in most cases. If you need to explain in every line what your code does, it's just not readable. There are so many ways to structure and style your code so it becomes more readable. Consider this example, where you only have a chance to understand what's going on by reading the comments:

def get_customer_data(csv):
    d = open(csv, 'r').readlines()  # read csv file as list with rows as strings
    cust_d = list()  # list of customer data which gets returned
    for i in range(1, len(d)):  # iterate over rows of csv file
        e = (d[0].split(';'), d[i].split(";"))  # split columns and rows to lists
        f = []  # helper array for better datastructure
        for n in range(len(e[0])):
            # add pairs of column name and value to helper array
            f.append([e[0][n], e[1][n]])
        # create dict with columns as keys and data as values for each customer
        cust_d.append({x.strip(): y for x, y in f})
    return cust_d

This code actually works with semicolon-separated CSV files, but it's really hard to read. So how can we improve this little mess?

How to let your code speak for itself#

There are a few ways to let your code tell what it's doing. The goal is that the person reading your code can easily understand what's going on, just by reading the actual code - and not any comments.

Use meaningful names

Every programmer probably has already heard this advice somewhere. And few programmers will say: "Ah, nothing easier than that! My names are always understandable, it's no hard deal". I think most of the people who said that (and I'm including me too) wouldn't understand some code they have written a few years ago in some old project because it can be really hard to give meaningful names an outsider would understand.
Don't get too big

Especially in large and complex projects with thousands of lines of code, there are often situations where functions, classes, etc. get really big. This causes various problems: On the one hand, it's just hard to read and keep the context in mind when you have to scroll down a few times for reading one function. On the other hand, giving meaningful names is much easier when you separate your code into different parts which you can give describing names.
Abstractions are your friends

This point goes hand in hand with the previous ones. Try to abstract everything you do. By using abstractions, you have lots of opportunities to give explaining names for the different abstraction levels. Additionally, you prevent your functions, classes, etc. to get too big. The idea of this approach is to hide further details behind every abstraction, and every time you dive deeper, you get to see more detail. Mostly it isn't necessary to know the details from another point of view, and it's way easier to understand the code's purpose from a high-level perspective.

With these concepts in mind, we can refactor the previous code example a bit. After doing that, it looks like this:

from typing import Dict, List

def extract_values_from_string(string: str) -> List[str]:
    cleaned_string = string.strip()
    string_as_list = cleaned_string.split(";")
    return string_as_list


def map_columns_to_values(column_names: str, rows: List[str]) -> List[Dict[str, str]]:
    column_value_mappings = list()
    column_names_list = extract_values_from_string(column_names)

    for row in rows:
        row_values = extract_values_from_string(row)
        customer_data = dict()

        for column_index, column_name in column_names_list.items():
            column_value = row_values[column_index]
            customer_data[column_name] = column_value

        column_value_mappings.append(customer_data)

    return column_value_mappings


def get_customer_data_from_csv(csv_filepath: str) -> List[Dict[str, str]]:
    csv_file = open(file=csv_filepath, mode='r')
    file_data: list = csv_file.readlines()
    column_names, rows = file_data[0], file_data[1:]

    return map_columns_to_values(column_names, rows)

This looks much more readable now, right? The code got three times longer than the old version, but now it's way more readable. There are also a few layers of abstraction: The top-level function doesn't show much detail. But every time you move an abstraction level deeper, it reveals more information.

Comments lie#

Another problem with comments is that they lie. By saying that, I mean that you can't really trust what comments say about the code. They always could be outdated and give wrong information about the code.

In the common case, the code we write changes many times and is never "perfect". As the project you're working on evolves and the requirements change, the code has to evolve too. Furthermore, nobody can write clean and perfect code instantly. Writing code is always an iterative process where you revise and improve the code in each step.

These continuous changes make it really hard to keep all the comments up to date. Imagine there is a comment in every second line of your code, or even more! On the one hand, it's really nasty and time-consuming having to change a comment every time you change something in the code. On the other hand, it's nearly impossible to keep all comments up to date, there will always be an unintentionally outdated one. With modern IDEs, it is very easy to refactor code by allowing the program to do so. Nowadays, changing names or cleaning up unused code are standards for every professional IDE. But in this way, the IDE only refactors the code and not the comments, how could it? It has inadvertently made the comments lying about the code.

So it's not allowed to write comments any more?#

After all these problematic issues about writing comments, we could come to the conclusion that we shouldn't write comments at all. But I can tell you this is not the case. And you can definitely write really good code with the usage of useful comments. First and foremost, this post wants to draw attention to the various problems when writing comments. We can also write better code that speaks for itself without writing comments.

There are still a few cases where it's inevitable to use comments. Most projects should have documentation, which often uses comments. Especially when the code or some interfaces to use it are available for externals, good documentation is essential. Either way, it would definitely make sense to explain the code's functionality when it solves complex problems. Then it's quite helpful for others to understand what the code does, without needing to understand the complex code. By writing comments like this, they also are less likely to lie about the code after refactoring. But you still have to adjust the comments and the documentation whenever the general behavioral changes.

To sum up, comments can be very helpful in writing understandable code. However, overusing them is never a good choice. They pose some relevant difficulties that everyone needs to be aware of.