I recently had to give a talk on “Good coding practices” for some new developers at Automattic. At first I had no idea how to summarize such a vast and dynamic topic, but as I was working on a plan I started to see some ideas come up over and over again and I decided to turn my thoughts into a post.
Caveat lector: what makes a “good coding practice” is highly subjective and is also likely to change over time; this is my personal interpretation of good coding practices for a particular codebase at the time of writing. Use your own judgement always.
The format of this advice is designed to mimic the guidelines from Clean code PHP and Clean code JavaScript, which in general are quite good, but I don’t agree with all their suggestions and I think there’s some good advice left out. I also highly recommend The Art of Readable Code; it goes into much more detail about many of the concepts below.
What is good code?
Code should be written to minimize the time it would take for someone else to understand it.
The Art of Readable Code
To me, good code is explicit, semantic, and modular.
A common maxim in development is “explicit is better than implicit“. Explicit code makes no assumptions about what the reader knows. send_welcome_email_to_user()
is an explicitly-named function. Implicit code does things that are implied or that you have to already know. If a function called get_price()
also applies discounts, applying the discounts is an implicit action. What if we had get_price()
and apply_discounts()
? Explicit code is safer to use because it doesn’t do unexpected things. Some behavior will always be implied – a function is an abstraction and all abstractions are leaky – so keep in mind that this guideline is a gradient, not true-or-false.
Semantic code has meaning. A variable named data
doesn’t tell us much about what it contains. A variable named timeSpent
is ok because it describes a purpose. A variable named secondsSpent
is better because it is pretty clear about what it contains. What’s the meaning of the variable open
? Would it make more sense if it was isOpen
? Naming well is probably the hardest thing in coding, so don’t beat yourself up but try to choose names for classes, functions, and variables that convey their purpose the best that you can. Good comments and clear spacing can also have a huge effect on the meaning of code. It’s normal to write code to solve a problem, but remember to then look at that code from the perspective of someone who has no idea of the problem or the context in which it exists.
Programs and their parts are written the way that humans think: as a sequential list of instructions. However, if you write all your code in one place, it becomes quite hard to remember what variables are used where, what conditions may be active, and how to make a change without breaking something else. Modular code is not tightly coupled to other code. Having a clear boundary between one function and another makes it possible to see the flow of a program quickly and incorporate changes with minimal risk.
Working with data
Use value objects rather than “smart” objects
It’s possible to divide objects into two general types: objects that “do” something, and objects that “are” something. For example, an object that represents the properties of a user is an object that “is” something; an object that sends an email to a user is an object that “does” something. Keeping these two responsibilities separate will produce code that is easier to reason about and is less bloated.
If a “user” object has a sendEmail
method, then that method will be passed around everywhere, even if most places never need to send an email. It also means that anytime we need to change how an email is sent, we have to modify the user object, potentially causing problems for code that has nothing to do with emails.
An object that contains only data and no operations can be called a “value object” (technically we’d also need to be able to compare those objects by value instead of by reference, and we could write functions to do that if necessary). They are cheap to create and provide a semantic container for a certain set of values.
Bad
class User {
public int $user_id;
public function send_welcome_email(): void { /* ... */ }
}
Good
class User {
public int $user_id;
}
class UserEmailer {
public function send_welcome_email( User $user ): void { /* ... */ }
}
With User
as a value object we prevent it from becoming bloated and make it easy to change. Constructor promotion and Readonly properties in PHP 8+ help this technique but be careful to avoid dynamic properties since they will be deprecated in PHP 8.2.
Depend on interfaces
An interface defines the structure or the API of a function, variable, or object. For a function, an interface defines what kind of data it accepts and returns. For a variable or object, an interface defines the data it holds and the methods it can perform.
When depending on an interface, we should generally depend only on the types that we need, although there are sometimes functions that need a very specific type of data.
In TypeScript:
Bad
function getUsername( user ) {
// ... lots of code here
return user.firstName;
}
getUserName()
is pretty simple in this example, but imagine that it was 100 lines long. You’d need to read the whole function to understand type of data you have to pass in. We can use an interface to explain exactly what kind of data the function needs and what it returns. This is a guarantee that the function will operate a certain way.
Good
interface User {
firstName: string;
}
function getUsername( user: User ): string {
// ... lots of code here
return user.firstName;
}
const myRegularUser = { firstName: 'john', theme: 'blue' };
const myAdminUser = { firstName: 'bob', dbKey: 12345, theme: 'green', accessLevel: 5 };
getUsername( myRegularUser );
getUsername( myAdminUser );
By just reading the first line of the function, we now know how to use it. Because we depend on an interface, we can pass any object into the function as long as it has a firstName
property.
In PHP:
Bad
function get_username( $user ) {
// ... lots of code here
return $user->first_name;
}
This function is actually very flexible. It accepts any kind of object that has a $first_name
property, but again there’s no way to know that without reading the function body. We can make its behavior more clear by adding types to the function definition. This is similar to what we did above with TypeScript, except that WordPressUser
is a concrete type.
Good
class WordPressUser {
public string $first_name;
public int $wp_id;
public int $access_level;
}
function getUsername( WordPressUser $user ): string {
// ... lots of code here
return $user->first_name;
}
By adding types we now know how the function is intended to work, and we’ve provided guarantees, but by using a concrete typehint we’ve actually decreased what the function can do in the process. The function has the ability to operate on any object that has a $first_name
property but we’ve limited it to only instances of the WordPressUser
class. In some cases that’s fine (maybe we only want to support that class), but we can do better.
Better
interface HasName {
public string $first_name;
}
class WordPressUser implements HasName {
public string $first_name;
public int $wp_id;
public int $access_level;
}
class TumblrUser implements HasName {
public string $first_name;
public int $tumblr_id;
public string $site_name;
}
function getUsername( HasName $user ): string {
// ... lots of code here
return $user->first_name;
}
By using an interface as the typehint, we now show that the function will work for any object that has a name.
For code that needs to support older versions of PHP, we can use PHPDoc types! These can enforced by static analysis tools like psalm or phpstan.
Don’t return data without a type
When writing functions we often need to pass around data in temporary forms. It’s tempting to do this by creating an anonymous object, but this means that anyone who wants to use that data must read where it is created in order to know what properties it has. In addition, if the data ever changes, without an interface the developer would need to manually find every place that uses that data to make sure it doesn’t break. Prefer using explicit interfaces even for temporary data.
In TypeScript:
Bad
function getUserData() {
const userData = queryUserFromApi();
// ... lots of code here
return {
name: userData.user_name,
id: userData.user_id,
};
}
This function returns an object with the keys name
and id
. In this example that’s pretty clear but imagine that the function was 300 lines long with 17 return
statements. You’d have to read each one to make sure you understood what kind of data it could return. Since we are the ones writing the function, we already know what it can return and can provide an interface.
Good
interface User {
name: string;
id: number;
}
function getUserData(): User {
const userData = queryUserFromApi();
// ... lots of code here
return {
name: userData.user_name,
id: userData.user_id,
};
}
By providing a return interface, we guarantee that the data returned will have the properties we’ve specified, no matter how complex that function might be.
In PHP:
Bad
function getUserData() {
$user_data = queryUserFromDatabase();
// ... lots of code here
return [
'name' => $user_data->user_name,
'id' => $user_data->user_id,
];
}
Similar to the TypeScript example, the only way to know what this function returns is by reading the whole function. We can do better.
Good
class User {
public string $name;
public int $id;
}
function getUserData(): User {
$user_data = queryUserFromDatabase();
// ... lots of code here
$user = new User();
$user->name = $user_data->user_name;
$user->id => $user->user_id;
return $user;
}
By using a value object, we guarantee what kind of data the function will return. Someone reading this code does not need to look inside the function to know how to use it.
Classes
Keep inheritance shallow, or avoid it altogether
The concept of building up behavior through a chain of ever-more-specific children is appealing, but in practice it creates obfuscated code that’s hard to reason about. While one level of inheritance can sometimes be appropriate, it can be hard to contain and can lead to creating “god” classes which have a bunch of unrelated methods just as a means of sharing code. The resulting children classes can have massive levels of bloat that are hidden when reading them.
If you come across a statement like $this->get_username()
in a child class, where is get_username()
defined? It could be somewhere else in the class, the parent, the grandparent, etc. If the class uses multiple inheritance (eg: PHP traits), then there are even more possibilities. What if the function is defined in several places? It can be very difficult to follow.
A better policy is almost always composition. “Composition over inheritance” is advice that has been around nearly as long as objects. It means that rather than an object “being” something that defines its behavior, the object “has” something that defines its behavior.
Bad
class DomainProduct extends Product {
use Emailable;
public function purchase(): void {
parent::purchase();
$email_address = $this->get_email_address();
$this->send_email( $email_address, $this->get_product_data() );
}
}
This looks nice and neat, but it’s a trap. The purchase()
function calls four functions of its own: another function called purchase()
defined on either the parent, grandparent, great-grandparent, etc., and get_email_address()
, send_email()
, and get_product_id()
which are defined on either the parent, grandparent, etc., or the Emailable
trait. To alter or debug this function, we have to search all those locations, and if those functions are defined in more than one place then we have to understand the rules of inheritance to figure out which one is called when.
Good
class DomainProduct {
private ProductEmailer $emailer;
private ProductPurchaser $purchaser;
private ProductData $product_data;
public function __construct(
ProductEmailer $emailer,
ProductPurchaser $purchaser,
ProductData $product_data
) {
$this->emailer = $emailer;
$this->product_data = $product_data;
$this->purchaser = $purchaser;
}
public function purchase(): void {
$this->purchaser->purchase( $this );
$email_address = $this->product_data->get_email_address();
$this->emailer->send_email( $email_address, $this->product_data );
}
}
In this version of the purchase()
function, there is no question where the other functions are defined. The second purchase()
is defined on ProductPurchaser
. get_email_address()
comes from ProductData
. send_email()
is defined on ProductEmailer
. Debugging or modifying this class can be done much more quickly. The above example also uses dependency injection but you don’t even need that; the below examples still use composition:
class DomainProduct {
private ProductEmailer $emailer;
private ProductPurchaser $purchaser;
private ProductData $product_data;
public function __construct() {
$this->emailer = new ProductEmailer();
$this->product_data = new ProductData();
$this->purchaser = new ProductPurchaser();
}
public function purchase(): void {
$this->purchaser->purchase( $this );
$email_address = $this->product_data->get_email_address();
$this->emailer->send_email( $email_address, $this->product_data );
}
}
Simpler still:
class DomainProduct {
public function purchase(): void {
(new ProductPurchaser())->purchase( $this );
$email_address = (new ProductData())->get_email_address();
(new ProductEmailer())->send_email( $email_address );
}
}
Functions
Keep functions small
Functions that have a lot of lines can become difficult to read and are often trying to do too many things at once. This can make debugging or modifying that behavior very expensive as we may have to understand the context of the whole function to work within it.
Keeping functions small isn’t usually difficult when we write them, but functions only grow larger over time. We need to fix a bug so we add 5 lines. We need to add a new feature so we add 10 lines. After five or ten years, that function now has 300 lines of code and no one wants to touch it. Keeping functions small is an ongoing process which must be considered every time we are editing code.
To break up large functions, try to extract sections of code into their own sub-functions. This also improves readability because the intermediate functions can have semantic names.
That said, please don’t write obfuscated code (code that is not readable) just to make a function smaller. If you need more lines to make a function clear, do that instead!
In JavaScript:
Bad
async function sendEmail( userId ) {
const api = getApi();
const userDetails = await api.getUserDetails( userId );
let emailAddress = getDefaultEmailAddress( userId );
if ( userDetails?.email ) {
const rawEmailAddress = userDetails.email;
const emailValidator = getValidator();
const validationResult = emailValidator.validate( rawEmailAddress );
if ( validationResult.isValidAddress ) {
emailAddress = validationResult.emailAddress;
}
}
const templateId = getDefaultTemplateId( userId );
const emailTemplate = getEmailTemplate( templateId );
const emailContent = emailTemplate.createContentFor( emailAddress );
const preparedEmail = prepareEmail( emailContent );
const mailClient = getMailClient();
mailClient.sendEmail( preparedEmail.recipient, preparedEmail.body );
}
There are lots of little operations happening inside this function. Adding comments between them would be one improvement.
The purpose of commenting is to help the reader know as much as the writer did. … When you’re writing code, you have a lot of valuable information in your head. When other people read your code, that information is lost – all they have is the code in front of them.
The Art of Readable Code
However, commenting is best used to document the thoughts that led to certain code; using comments to fix poor readability is time that could be spent improving the readability. In this case, extracting some of the operations into their own functions is even better.
Good
async function sendEmail( userId ) {
const emailAddress = getEmailAddressForUserEmail( userId );
const emailContent = getEmailContent( userId, emailAddress );
const preparedEmail = prepareEmail( emailContent );
sendPreparedEmail( preparedEmail );
}
Keep indentation shallow and avoid else
As developers, we edit code far more often than we write it from scratch. Because of this we usually add code inside places that already exist, and some of those places are indented due to if
statements. This indentation then tends to increase over time as more conditions are added.
Nested conditions can be difficult to read because when the reader looks at a line, they must read upward several layers in order to know when it will execute. Using else
can sometimes increase this problem because it starts a block which is already far away from its condition (although in cases with very few lines, else
is fine).
Try to keep indentation to one or two levels, and replace else
with a new if
where appropriate. There are several techniques for reducing indentation, like inverting a condition to return early.
In PHP:
Bad
function handle_click() {
if ( can_edit( $user ) ) {
$email_params = get_email_params( $user );
if ( $email_params->address ) {
$email_content = prepare_email( $user, $email_params );
send_email( $email_params->address, $email_content );
} else {
$backup_email_params = get_email_params( $backup_user );
if ( $backup_email_params->address ) {
$email_content = prepare_email( $backup_user, $backup_email_params );
send_email( $backup_email_params->address, $email_content );
}
}
}
}
Good
function handle_click() {
$can_edit = can_edit( $user );
$email_params = $can_edit ? get_email_params( $user ) : new EmailParams();
if ( $can_edit && $email_params->address ) {
$content = prepare_email( $user, $email_params );
$address = $email_params->address;
}
$backup_email_params = new EmailParams();
if ( $can_edit && ! $email_params->address ) {
$backup_email_params = get_email_params( $backup_user );
}
if ( $backup_email_params->address ) {
$content = prepare_email( $backup_user, $backup_email_params );
$address = $backup_email_params->address;
}
send_email( $address, $content );
}
By removing the nested indentation, we make the operation of the function much more clear. However, we had to resort to ternaries, multiple conditions, temporary variables, and noop objects. This might be fine in some cases, but I think we can do better.
Better
function handle_click() {
if ( ! can_edit( $user ) ) {
return;
}
$email_params = get_email_params( $user );
if ( $email_params->address ) {
prepare_and_send_email_to_user( $user, $email_params );
return;
}
$backup_email_params = get_email_params( $backup_user );
if ( $backup_email_params->address ) {
prepare_and_send_email_to_user( $backup_user, $backup_email_params );
}
}
By using early returns and extracting some of the logic into its own function, the code becomes even more readable.
Avoid “magic” numbers and boolean arguments
Reading a function call should explain what is likely to happen. Literal numbers or boolean values as arguments to a function can make a reader confused because it’s not clear what they mean without looking at the function definition. Prefer using named variables for numbers and either descriptive strings or multiple functions for different behaviors.
In JavaScript:
Bad
sendUserEmail( user, 6, true );
What does 6
mean? What does true
mean? We have no way to know. Probably someone added them later when they needed to support additional behavior in the function.
Good
sendUserEmail( { user, numberOfRetries: 6, useBlueTemplate: true } );
Creating and destructuring objects in JavaScript is so easy that a common pattern is to use a single object argument to give each real argument a name (this simulates what’s called “named arguments” in some languages).
In PHP:
Bad
send_user_email( $user, 6, true );
This code has the same problem. What do 6
or true
mean in this context?
Good
$number_of_retries = 6;
send_user_email( $user, $number_of_retries, BLUE_TEMPLATE );
We can use temporary variables or constants as a form of documentation for arguments, but it cannot be enforced by the person who wrote the function, and the names don’t have to be consistent (at least if you are not using PHP 8 which has named arguments). We can do better.
Better
$email = new UserEmail( $user );
$email->number_of_retries = 6;
send_user_email_with_blue_template( $email );
Unless there are a large number of options we need to support, creating a new function for each flag is not too difficult and using a value object to group related data allows us to turn three arguments into one. This has the added advantage that any future flags can be easily added to the object’s interface without increasing the number of arguments.
Use searchable names for functions and variables
Despite many advances in editors, IDEs, indexing databases, and Language Servers, the primary method most developers use to find code in a project is plain text search. While it can be a pain to type long variable and function names, it makes them easily searchable (and all editors have autocomplete anyway).
In PHP:
Bad
$user->send();
Just try to search several thousand PHP files for the word send
. It won’t be easy to find what you’re looking for.
Good
$user->send_welcome_email();
It’s much more likely that send_welcome_email
is a unique string.
If something can fail, however unlikely, handle the failure
Many statements in code have a failure case. Some functions can return a falsy value if they cannot find their data. Many untyped objects might be missing a property. Code lasts a long time. While a particular failure case might seem impossible, chances are good that over many years of use, it will be hit. Always handle every possible failure, even if handling it is just throwing an error with a descriptive message or silently skipping some functionality.
In TypeScript:
Bad
function getDataFromApi(): {name?: string} { /* ... */ }
function sayName(name: string): void { /* ... */ }
const dbData = getDataFromApi();
sayName(dbData.name!);
In TypeScript, typically you are already forced to handle every failure (at least ones that produce a value into running code) because otherwise the compiler will give you an error. In the above example, dbData.name
may not be set, but sayName()
requires a string. This will not compile.
However, there’s a trick called a “non-null assertion” (the exclamation point above) that tells the compiler, “I know better than you; this will always exist.” That’s great if you’re right, but what if you’re wrong? A hard-to read fatal error might occur.
Good
function getDataFromApi(): {name?: string} { /* ... */ }
function sayName(name: string): void { /* ... */ }
const dbData = getDataFromApi();
sayName(dbData.name ?? 'Unknown');
There’s different ways to handle data that we expect and which isn’t there. One way would be to say throw new Error('Tried to say a name but no name could be found')
. That error message would be easy to locate in the code for debugging. However, maybe we don’t care and just want to print something. In that case, we can provide a fallback without much effort.
In PHP:
Bad
function get_data_from_db(): array { /* ... */ }
function say_name( string $name ): void { /* ... */ }
$db_data = get_data_from_db();
say_name( $db_data['name'] );
It’s very common when working with a MySQL database in PHP to query a table and assign the results to an array, assuming that the array values are all what we expect them to be. It’s not uncommon for that assumption to be false, causing fatal errors with hard-to-read messages. Sometimes PHP doesn’t even throw an error and silently logs a warning instead, which means the developers will probably never know about the problem at all.
Good
function get_data_from_db(): array { /* ... */ }
function say_name( string $name ): void { /* ... */ }
$db_data = get_data_from_db();
say_name( $db_data['name'] ?? 'Unknown' );
As mentioned above, we have many options for dealing with missing data. We could throw new \Exception('Tried to say a name but no name could be found');
, or we could silently skip the function call, or – as in this example – we could provide a fallback.
So, how do I write good code?
- Code is read more often than it is written. Write code for humans to read.
- Code is changed more often than it is added. Always look for ways to make existing code easier to read.
- Writing code usually means focusing on the goal, not the process. Take time to do the right thing with names, error handling, comments, function sharing, and function arguments.