Categories
Uncategorized

Writing a diff/PR description to get better reviews

During a recent work meetup, we had an impromptu exercise where we went over what makes a GitHub PR description most useful to the people reviewing it, particularly when not everyone is working on the same part of the codebase. Better descriptions make for faster reviews! This is my attempt to turn that discussion into a post.

More than just improving review speed, though, these tips should also make it easier for future developers (maybe you!) trying to understand what changed and more importantly, why. The “why” of a decision is so easily lost as links die and people forget.

The title

The title of a change should quickly convey what is affected. We have tons of code at Automattic and it’s easy to end up looking at a change with your mind somewhere completely different. The title can help to ground a reviewer so they understand what they’re looking at.

  1. Prefix the title with what section of code or system is being affected. Examples are Shopping Cart:, Manual Payments:, Checkout:, etc. Sometimes this is not possible or not necessary if the rest of the title conveys the information clearly, but it’s a helpful shortcut when you want to describe something inside that system. (Some folks like to use additional markers like feat from Semantic Commit Messages and while these are nice but I personally have not found them as useful outside of commits. Convince me otherwise!)
  2. Write the title in the imperative, start with a verb, and do not end with a period. This is the same advice commonly given for commit messages. The imperative tone reads more clearly when looking at a list of changes. Examples are, Add new field to checkout form, Reduce number of re-renders in contact details, Fix race condition while saving cart. Try not to use indicative or past-tense like Adds a new file or Fixed bug. No trailing period is needed because the title is always displayed separately from other content like the title of a book chapter or newspaper article.

Title example

Manual Payments: Replace 'change ownership' dropdown with input

The body

The body of the PR will vary considerably based on the nature of the changes but generally should answer three questions:

  1. How did the system being touched work before this change? We only need to describe the necessary parts, but do give some basic context (like this) for someone who’s never seen that system before! (You and your team may know the system inside and out right now but after a few years pass, you’ll have no idea what any of this means.)
  2. What is the problem with the system just described that this change is intended to resolve or improve? This is often easy but can sometimes be complex to explain well (like this). Do not rely only on a link to an issue or a Slack thread.
  3. What does this change do and how does it resolve the problem just described? A set of “before” and “after” screenshots can often be worth a thousand words in some circumstances (like this).

These three things don’t always have to be in that order, and they don’t need to be very long, as long as the context and reason are clearly conveyed (like this). It’s far too easy to just write the third part and bypass the other two.

For additional context, provide links to related P2 posts, issues, or even Slack threads, but don’t rely on those links alone. They should be like footnotes in a book; useful if someone wants to learn more but not necessary to understand the meaning. Maybe even directly quote those sources if something on the other side of a link is particularly useful. You never know when a link will stop working or who might not have access to it.

And don’t worry about writing too much! As we say at Automattic,

It’s perfectly OK to spend more time crafting your commit message than writing the code for your commit.

It’s perfectly OK for your commit message to be longer than your commit.

Another way to explain all of this is,

In most cases, you can leave out details about how a change has been made. Code is generally self-explanatory in this regard (and if the code is so complex that it needs to be explained in prose, that’s what source comments are for). Just focus on making clear the reasons why you made the change in the first place—the way things worked before the change (and what was wrong with that), the way they work now, and why you decided to solve it the way you did.

https://cbea.ms/git-commit/#why-not-how

Body example

The "Change ownership" modal in the Manual Payments form includes a dropdown menu of all users who have created a manual payment. However, to do that, it calls `get_list_of_payment_creators()` which loops through all such users and calls `get_userdata()` on each. This process can cause PHP to run out of memory before it completes, throwing a fatal error.

In this diff, we change the modal to use an input field instead of a dropdown. This is less convenient but will keep the page from crashing while we determine a better solution. For convenience, the input field is pre-filled with the current user's ID like the dropdown menu before it.

[Fixes the issue reported here]

[Before screenshot]

[After screenshot]

The testing instructions

There are many different kinds of changes, and they require different sorts of testing and review. Sometimes a change cannot really be tested manually and we must rely on automated tests. Sometimes we cannot even do that and must rely on pure code review. However, if you want to get a useful review from people who aren’t intimately familiar with the code, it’s a good idea to give them as much help as possible.

  1. Make as few assumptions as possible about what your reader knows how to do and list what assumptions you can (eg: having a Jetpack site, having a new user account, having an email product subscription). This can be surprisingly hard! You probably know the code so well that writing step-by-step instructions seems like a waste of time (it’s not).
  2. Provide links, command-line instructions, or inputs that can be copied and pasted for the reader. If these include data that needs to be customized (eg: blog ID, user ID, etc.), make sure that this is clear and, if you can, provide functional examples. For instructions that involve UI, provide screenshots. For testing that requires modifying the code itself (eg: to force a rare condition), provide a context diff (like this).
  3. Keep each instruction as simple as possible and split complex instructions into multiple steps. Use numbered steps if there are a lot so the reviewer can more easily keep their place. Split up multiple test cases into their own sections with their own bullets or step numbers (like this).

One way to make all of this relatively easy is to write down/screenshot/copy-paste each thing you did when you were testing the change yourself, but remember to proofread it in the shoes of someone who doesn’t know what you’re talking about.

I often find that while doing this I discover test cases I hadn’t considered or ways I could make the change more clear, so in the end there’s benefits to more than just the reviewer.

Testing instructions example

1. Create a manual payment by following the steps [here].
2. Verify that the created manual payment page displays without errors (previously this would cause a fatal).
3. Click to change the ownership of the payment method. Enter the userid of a different owner and save the form. Verify that the change works correctly.

The challenge

Most developers, myself included, spend a lot of energy preparing and writing code. When it comes time to post that code for review, we often don’t have a lot of energy left. Not to mention that the act of writing prose is quite different from coding and may require a significant context switch to do well.

While not always possible (or needed), let’s try to have compassion for our readers and treat the writing of change descriptions as seriously as we do the code itself. This may mean taking a break before submitting it for review. It may help to pretend that you are explaining your change to a non-developer friend and giving them instructions for testing it. If they can do it, then your actual developer friends should do great!

Do you find something not mentioned here useful in diff or PR descriptions? Please let me know in the comments!

Categories
Uncategorized

grepdef: a quick way to search for definitions in code

I live inside my code editor all day and something that I need to do pretty often is to search for the definition of a symbol (a variable, constant, or function). For example, I might be debugging a problem in a JavaScript file when I come across this code:

//...
let newData = translateDataFromRaw(rawData);
//...

I need to see translateDataFromRaw to understand how it works. How can I do this?

If I’m using an IDE or a project covered by a tags file, I can use the magical “jump to definition” feature. This is usually the best option as it is fast and nearly always accurate. However, if I’m using a simple code editor that does not have a “jump to definition” feature, or if the feature is still indexing my project (this can take quite a while sometimes), or if I want to find that function on the command-line for some other purpose without having to open up an IDE, then this will not work. What are my other options?

I could use grep to search my files for the string translateDataFromRaw and then find the definition from among the results. This works well and is pretty simple, unless that function is used in a lot of places. If there are hundreds of results, it may take me forever to find the one I want.

To help with this situation, I wrote a little command-line tool called grepdef. With it, I could just run grepdef translateDataFromRaw and I’ll instantly get the grep result for the definition.

$ grepdef translateDataFromRaw
./src/translators.js:function translateDataFromRaw(data) {

It’s way faster than “jump to definition” on some of the projects I work with. I even have it integrated into my editors with the plugins vim-grepdef and vscode-grepdef.

How does it work?

It’s basically the same as the grep technique I described above, except that it has two advantages:

First, it uses a regular expression to find symbol definitions unique to a specific code language. This can definitely be inaccurate, but in my experience it’s close enough.

Second, it uses ripgrep which is blazingly fast.

All together, grepdef saves me hours of development time. Maybe it will help you too?

Right now it only supports two language types: PHP and JavaScript/TypeScript, but it’s built to be easily extensible to other languages. Please suggest them as issues!

(Photo by Clay Banks on Unsplash)

Categories
Uncategorized

What I learned writing a game engine

One of the things I did on my recent sabbatical was to start coding a video game in JavaScript. I learned a lot in the process, and I thought I’d write down some of my experiences. (For the record, while I wrote the game engine myself, I used the excellent Pixi library for graphics.)

Classes are useful after all

The more I’ve written code in JS the more I’ve eschewed classes. JavaScript’s primitive object type is pretty succinct and powerful already. Also, the prototype object model has an awkward syntax when using new. The class keyword makes this better, but can increase confusion due to call site binding (class fields with arrow functions fix this, but that feature is still in stage 3). It doesn’t help that the new keyword is misleading to those familiar with classical inheritance. Lastly, using class makes it easier to use inheritance instead of composition.

I don’t actually think of any of these as problems with the language, since I find it easy to write clear code with just functions that pass around data. In my experience, web applications do best when they never mention the new keyword at all. However, my first attempt to apply this methodology to game development ended in quite a mess.

It turns out that while most of the web application data I’ve dealt with is easily represented and moved around using primitive types, the entities in a video game have so much internal state and so much interactive functionality that a one-way flow approach (like the one popularized by React) is terribly inefficient. The way I did it, rendering each frame was a nightmare of indirection.

Keeping the state and functions together within each object and sharing those functions using inheritance ended up being a whole lot simpler. It also meant that I had real object types that could be validated using Typescript.

Math is useful after all

When I tell people I’m a programmer, they often think it means I’m great at math. I’m not great at math; that’s why I like computers so much! But it turns out that dealing with virtual objects in a virtual space requires quite a bit of algebra, trigonometry, and calculus.

It doesn’t help that most of the game development tutorials I came across assume that you’re using an existing game engine to do the work. The first time I tried to figure out how to calculate the angle between two sprites, all I could find was instructions for slerping quaternions.

Do you know what a quaternion is, or spherical linear interpolation? Well, they’re totally unnecessary in this case. In 2d space, all I really needed was arctan2(), which I probably learned about in high school and subsequently forgot. But it took me a while to figure that out. I think that if I had been taught math in school in the context of game development, I probably would have paid a lot more attention.

Making a scripting language is great

As I was developing events in the game, I wanted a way to easily centralize dialog such that I didn’t need to write code to create conversation trees (actually they’re graphs!). I did this by creating a JSON file that encoded dialog nodes, each with their own responses, and links between those responses and other nodes.

This worked really well, but I quickly realized that I needed to also encode functions into these nodes so that actions could be taken when certain choices were made. When a player insulted the captain of an enemy ship, for example, I wanted to decrease the happiness of that character and possibly have the enemy ship attack. Furthermore, I found that when a ship did attack, I wanted to encode the ship’s behavior itself. In short, I began to write the content of my game in a script.

At that time I was putting JavaScript functions into my JSON file to encode actions. That meant that it wasn’t really JSON anymore. If I wanted to serialize my nodes, I’d have to put those functions into strings. This immediately felt wrong because I’d have to let my game engine execute arbitrary JavaScript, which is about as big of an anti-pattern as you can get.

Professional game engines are not written in JavaScript, so they can use JS as a scripting language, but I would have to do something else if I wanted to allow executing arbitrary scripts. I began to wonder if I could write my own, very simple scripting language for the purpose. I was fortunate to discover Nearley which made that both possible and fast.

I first wrote a grammar that looked something like JavaScript, except without any variables or named function definitions (although there were anonymous functions). Statements were composed of expressions which were composed of function calls and primitives. The functions were all pre-defined; this was my game’s API. All of that worked pretty well, but I ran into a problem when I added if statements for control flow.

Each statement had to end with a semicolon. Since the language’s programs had to be encoded as a JSON string, and JSON strings cannot have newlines, it was imperative that I could chain statements together on one line. However, if statements in most languages do not end with a semicolon. I had a really hard time encoding this into the grammar, and so each if statement had to end in a semicolon as well. With that requirement I found conditions really awkward to write.

if (distanceToPlayer() < 5) { if (getNpcHappiness('enemy1') > 2) { gotoNode('enemy1Greeting'); }; if (getNpcHappiness('enemy1') <= 2 { attackPlayer(); }; };

Eventually I realized: I had multi-argument functions and I had anonymous function definitions… so I could encode control flow as a function call! That is, instead of this procedural condition statement:

if (distanceToPlayer() > 5) { accelerate(); };

I could write this functional version:

if(distanceToPlayer() > 5, { accelerate(); });

The difference was subtle but powerful. I was able to remove nearly half of the grammar I had written, since the function call syntax was already supported.

My scripting language is not extremely powerful, but even with a simple grammar in place I was able to write most of the events of the game entirely in one JSON file. I can see how it would be possible to write a dedicated dialog and level editor for that file, which would totally separate the game and its engine.

Conclusion

I’d love to show you my game, but I was pulled into other adventures on my sabbatical before I was able to finish writing it, and it has a long way to go. Maybe I’ll pick it up some time in the future, but for now I’m happy about all the things I learned getting it as far as I did. If you want to look at my code, feel free to browse the repo!

Categories
Uncategorized

Await, there’s more!

This week I gave a talk at my local JavaScript meetup on the history, use, and future of Promises and I thought that you, dear reader, might be interested as well. Here’s the blurb:

JavaScript is an asynchronous language; it is designed to react to events and to trigger jobs that take an unknown amount of time to complete. While there is a fairly standard way to call functions when something happens, until recently there had been no standard way to chain these functions together, nor an easy way to handle failures inside the chain. Promises provide the solution. This talk will guide the group through the motivation for Promises, how they work, and what comes next (async/await and beyond).

My slides for the talk are at the following link:

https://sirbrillig.github.io/js-promises-slides/

During research for the talk I discovered that Top Level Await is already available in Chrome Devtools (not in Chrome itself). So if you’re just experimenting, you can run both of the following snippets. The first one uses the native Promise methods to fetch some data (try it!).

fetch('https://jsonplaceholder.typicode.com/todos/1') 
  .then(response => response.json()) 
  .then(json => console.log(json))

The following one is the same, but uses async/await.

const response = await fetch('https://jsonplaceholder.typicode.com/todos/1');
const json = await response.json();
console.log(json);

If you’re not familiar with the syntax of await, the new thing here is that with the proposal, you can use it outside of an asynchronous function. Otherwise, you can only use the await keyword in a function declared with async.

Writing a function with the async keyword actually just guarantees that the function will return a Promise object. If it does not, its return value is wrapped in a resolved Promise automatically. So in actual code today you’d need to write something like the following in order to use await.

async function getDataFromServer() {
  const response = await fetch('https://jsonplaceholder.typicode.com/todos/1');
  const json = await response.json();
  return json;
}

async function main() {
  console.log(await getDataFromServer());
}

main();

Just for completeness, you’ll need to know that while we can handle Promise rejections (and thrown Errors) with the catch() method on a Promise, you’ll probably want to use the regular try/catch syntax to handle rejections when using await.

When using await, if the Promise is rejected, the await expression throws the rejected value.

async function main() {
    try {
        await fillKettle();
        await boilWater();
        await addLeaves('green');
        await steepTea('1 minute'));
        drinkTea();
    } catch (error) {
        console.error('There was a problem making tea.');
    }
}

main();

If none of that makes sense to you and you’d like to start from scratch learning about Promises in JavaScript, check out the whole slide deck. A big thanks to everyone who came and especially to those who asked questions!

Categories
Uncategorized

Alternatives to Else

One of the first imperative programming concepts I ever learned was if/else. With this relatively simple power tool I could make decisions in my code based on any number of factors.

Of course, my early programs were… a little hard to read. I hadn’t yet learned one of the maxims of programming that I try to live by today: write code first for humans to read, then for computers.

Using if for conditions is pretty important. There’s other ways to code, but as a concept it’s usually extremely readable. On the other hand, the seemingly harmless else can add a world of problems. I’d like to talk a bit about why I feel this way, and explore some alternatives.

(note: these examples are in PHP, but the concepts are the same for javascript and other languages.)

Cognitive Load

The biggest issue I have with else is… “what’s the condition?” An else statement prefaces an arbitrarily large block of code, but unless I happen to have just written the code, I have to read upward past another arbitrarily large block of code before I can figure out the conditions on which that block will execute.

Many if/else statements start off with each block holding just one or two lines, and surely that’s not an issue, right? Who could complain about this innocent code?

if ($wantSomeToast) {
  butterBread();
} else {
  makeASandwich();
}

The problem arises when those code blocks grow. And they will grow. Without careful pruning, code only gets bigger. Perhaps not every if block will get to be 300 lines long, but at least some of them will, and in my opinion there’s really no need to take that risk.

When you’re debugging some issue and a diff or your editor only shows something like the following, who knows what’s really going on here?

  // ...
  bakeCake();
  $frosting = new Frosting();
  engageFroster($frosting);
} else {
  drillHole();
  $jam = Jam::fetchStrawberry();
  $result injectJam(new JamInjector(), $jam);
  if ($result !== 'ok') {
    throw new \Exception('Jam is jammed');
  }
  //...

Parsing this means scrolling up until you find the if condition, and then holding that in your head while you work on the else block. And what if there’s else if statements, or nested conditions?

Alternatives

In short, I think that when else is used it needs to be as close to the if as possible. Within a few lines, really. Because, again, this is not a problem:

if ($wantSomeToast) {
  butterBread();
} else {
  makeASandwich();
}

But this is a problem:

//... many lines...
} else {
  makeASandwich();
//... many lines...

To that end, any time you can easily avoid else, I’d argue that it’s worth the effort. Here’s some ways to do that:

The first thing I see a lot is a simple boolean return.

function isTostValid($toast): bool {
  if (empty($toast->grain)) {
    return false;
  } else {
    return true;
  }
}

I know that this models the original developer’s thought process, but look how large it is, and remember that it’s far too easy for later developers to come in and add unrelated code to those blocks. What if we did this instead?

function isTostValid($toast): bool {
  return ! empty($toast->grain);
}

Much easier to read. Another alternative is an early return. This set of conditions could grow very large after a while:

function getFavoriteJam($name): string {
  $jam = null;
  if ($name === 'joe') {
    $jam = 'marmalade';
  } elseif ($name === 'jane') {
    $jam = 'raspberry';
  } else {
    $jam = 'strawberry';
  }
  return $jam;
}

But it could be easily turned into a version without else, by using early returns. The first conditions will still override later ones, but any time we follow code and see return we know that we can stop reading. It becomes a set of simple decision making tools rather than one complex multi-decision machine.

function getFavoriteJam($name): string {
  if ($name === 'joe') {
    return 'marmalade';
  }
  if ($name === 'jane') {
    return 'raspberry';
  }
  return 'strawberry';
}

Sometimes you can turn an if/else into two if statements by repeating and inverting the first if condition. This may seem to violate the DRY (Don’t Repeat Yourself) principle, but in practice it grants a lot of clarity.

if ($name === 'joe') {
  $jam = 'marmalade';
} else {
  $jam = 'raspberry';
}

That else can just have its condition repeated to become:

if ($name === 'joe') {
  $jam = 'marmalade';
} 
if ($name !== 'joe') {
  $jam = 'raspberry';
}

Another common pattern which this demonstrates is a condition being used to set a variable. In those cases, we can use null coalescing (or logical OR in Javascript), ternaries, or even function calls to make the assignment less likely to grow out of control later. Here’s a ternary version of the above.

$jam = ($name === 'joe') ? 'marmalade' : 'raspberry';

In this version it’s immediately clear that the whole line is about assigning $jam based on a condition, and the condition is the very next thing. In the first version it would take reading five lines and then drawing some conclusions to be able to be sure that’s all that’s happening, and just imagine how hard it would be when twenty other statements get added to those blocks!

If there’s a lot of computation or multiple decisions needed before the variable assignment, put all of that into its own small function and you end up with this, which is even more clear:

$jam = getJamFlavor($name);

There’s a lot of other ways we can avoid else in our code, and even if you’re skeptical, I’d like to recommend you give it a try! There’s no one best option, and it might just change how you think about coding entirely.

(Photo by Jens Lelie on Unsplash)

Categories
Uncategorized

How do we deal with dependencies in PHP

Generally what we want to do when we’re coding something is to call a function.

do_the_thing();

While some of the functions we call could stand alone and some could be methods on an object, they’re all just function calls.

So why does most PHP code have so many classes? I don’t think code organization is a good answer, since we have access to namespaces which can do that just as well. I believe the only valid reason apart from value objects is because of trying to manage dependencies.

Let’s say you are writing code which wants to refund a purchase through a billing system. You call something like refund_purchase($purchase) and you expect to get back a receipt id for the refund. From most caller’s perspectives, that’s all we should need to worry about.

However, let’s say the refund function needs to do the following things:

  • Contact the payment processor to initiate the refund.
  • Create and save a new receipt to the database.
  • Queue webhooks for the refund.

In order to do any of those things, it requires references to other systems. What if we want to switch out those other systems? Perhaps we want to test our function without a database, or we want to change the payment processor. This is where dependency injection comes into play.

Dependencies as Arguments

In order to use dependency injection in the simplest sense, we need to pass interfaces for each of the dependent systems into our function as arguments.

function refund_purchase($purchase, $processor, $database, $webhooks) {}

This is not scalable, though, because as the number of dependencies increases, so does the number of arguments, making it very confusing to call the function in the first place. In addition, when a developer wants to add a new refund call, they also have to figure out how to get each one of the dependencies, each of which may have their own dependencies!

Wrappers

The alternative to passing dependencies as arguments is to use a wrapper function to pre-inject our dependencies before they are needed. In PHP (and other Object-Oriented languages) the most accepted way to do this is using a class constructor as that wrapper. (In some other languages we might use a Higher Order Function, but this is much more challenging in PHP).

class Refunds {
  public function __construct($processor, $database, $webhooks) {
    $this->processor = $processor;
    $this->database = $database;
    $this->webhooks = $webhooks;
  }

  public function refund_purchase($purchase) {}
}

But now we are stuck with another problem: were do we call the wrapper? Since we might want to ask for a refund in any number of places, generally we must force the calling code to call the wrapper as well.

$wrapper = new Refunds($processor, $database, $webhooks);
$wrapper->refund_purchase($purchase);

That’s not a whole lot better than putting the dependencies into the function arguments, though; it’s confusing and annoying for the caller. Also, any time we want to add new dependencies we would have to change every single caller.

Factories

A solution to the issue of where to call the wrapper is to put it into a stand-alone function.

function create_refund() {
  global $wpdb;
  $processor = new DefaultProcessor();
  $database = $wpdb;
  $webhooks = new Webhooks();
  return new Refunds($processor, $database, $webhooks);
}

The calling code must still create the wrapper before it uses the function it wants, but at least that can be broken down to a one-liner.

create_refund()->refund_purchase($purchase);

In practice, these wrapper calls are often put into static functions, and can even be functions attached to the classes they are meant to wrap, which puts them close to the function call we want and so makes them easier to discover.

Forwarding Functions

It’s also possible to wrap the wrapper call into its own forwarding function to remove any need for the caller to know about our wrappers.

function refund_purchase_with_defaults($purchase) {
  return create_refund()->refund_purchase($purchase);
}

This may seem like an extreme level of redirection, but it provides the best of both worlds: as long as the forwarding function does not do anything apart from call the wrapper and its method, most callers can ignore dependency injection entirely. This only works if we make sure never to use the forwarding functions inside other wrappers, though.

Containers

Eventually we end up with tons of wrappers, as there are many functions that have dependencies and they can’t always be grouped easily. This presents two new problems: it can be hard to locate the wrapper you need, and in order to keep everything as simple as possible for the caller, we end up creating and re-creating the wrapper class many times within a single execution of the code. This can immensely bloat our memory usage since each one of those instances keeps its own state and references.

Therefore many projects end up with a Dependency Injection Container which puts all the wrapper functions into one place and also caches the instances it creates, making it very easy for callers to keep their one-liners without having to worry about memory issues.

class Container {
  public function create_refund() {
    global $wpdb;
    if (! $this->refunds) {
      $processor = new DefaultProcessor();
      $database = $wpdb;
      $webhooks = new Webhooks();
      $this->refunds = new Refunds($processor, $database, $webhooks);
    }
    return $this->refunds;
  }
}

Having these wrappers creates a new problem for our functions themselves, though, which I mentioned above: it’s now so easy to call other wrapped functions that it’s tempting to use the wrappers to make calls to other systems rather than using injected references. Doing this defeats the purpose of our wrappers entirely and we end up right back where we started. Therefore we must have some mechanism, usually just code review, which prevents anyone from using wrappers within a wrapped function.

Do we need all this?

All of this work is built on the premise that dependency injection is good and a necessary thing to have. The counter-argument is that without that premise, our code is so much simpler. So let’s examine the concept a bit more.

In any given project there will probably be functions for which the dependencies are relatively trivial. Let’s say a function needs to log events for analysis, but we don’t want to log anything while running automated tests. Rather than injecting a logger, we could just have the logger function itself detect if we’re running inside a test and if so, do nothing. If more special cases are needed, those can be added directly to the function as well.

function log_message($message) {
  if (! defined('ARE_TESTS_RUNNING')) {
    write_log_message($message);
  }
}

This technique has the advantage that it’s more foolproof than relying on callers to substitute dependencies, so we might be doing it anyway to protect our data. Even though it relies on global variables and constants, a PHP process is bounded in both time and scope so in many projects this reliance on globals creates fewer problems than it solves.

It’s also likely that in any given project there are at least a few functions which have very complex dependencies, and therefore are hard to test without running them in isolation. These functions probably benefit from using a wrapper approach as described above.

So when exactly do we need dependency injection and when is it just needless work? Even though to me the elegance of separating dependencies is a reason in itself, to be honest I can only think of two real needs. I think that we must ask these questions of each dependency we find.

Questions to ask

  1. Is a dependency large or complex enough to require mocking during testing, or do we need to test the interaction with a dependency itself?
  2. Is it likely that a dependency will need to be changed now or in the future because it represents some system that has more than one implementation?

As soon as any dependency in a project answers “yes” to one of the above questions, I think that it’s worth including a Dependency Injection system of some kind. Even if the answers are “no”, it might be a good idea if the project is a library which itself is used by other systems, because it’s often hard to predict how a library might be used. It can be a pain to be forced by your libraries to adopt specific other libraries.

And if you do have a Dependency Injection system, if any given dependency does not answer “yes” to the above questions, then perhaps you can ignore injecting it. The purist in me rebels against the idea of a single function including both injected dependencies and non-injected dependencies, but given the large amount of code needed to use a wrapper in PHP, it may just be the better choice.

(Photo by John Carlisle on Unsplash)

Categories
Uncategorized

Safely coding with constants

In PHP there is a tendency to assume that the code we are working on is the only code that is running. The global and transactional nature of the language’s past has made this easy to do.

This tendency naturally leads us to use global state, and while there does seem to be a resistance to using global variables, I’ve only seen increased usage of constants created using define(). I would like to suggest that constants created in this way are actually just global variables by another name. Worse, perhaps, they are global variables whose values cannot be changed during runtime. If that last bit seems like an advantage rather than a disadvantage, read on.

First let’s briefly cover the risk of global state. As I’ve seen in my experience over and over again, the nature of code is to proliferate over time and when code starts to have complexity, global state becomes a liability. Anything the programmer cannot see directly when coding is something they have to keep in their head. And there is only so much we can keep in our heads, especially when large periods of time pass and when many different people are involved. A local variable (local state) is much easier to reason about, because it’s “right there” in front of us and less likely to be affected by things “somewhere else”.

There’s more to it than just making code more readable, though. Another way we learn about and protect code from bugs is by testing, both automated and manual. The nature of testing is to try some code, then change the state and try it again to see what the effects are. Remember how I said that constants created by define() cannot change? Let’s look at an example,

function getGroceries(): string {
  if (defined('LIKES_PEAS')) {
    return purchasePeas();
  }
  return purchaseCarrots();
}
// ...
echo "I will buy some " . getGroceries();

How would we write a test case for this function? The risk of having a condition on a defined constant is that apart from manual testing it is impossible to test it. Constants by their very nature cannot change, and defined constants are global in scope so there is nowhere they will not be. Once we define it, we cannot change it during the same runtime and most PHP test runners execute all their test cases in the same runtime.

That’s not to say that all constants are problematic. It’s more an issue of how we use them. Essentially, if we just stop writing if (defined())… in our code, the problem goes away.

But it’s not that easy, is it? The argument that I’ve heard most often to keep checking for defined constants is that they already exist in most large projects and so we have no choice. But this is not quite true.

The values stored in these constants are not themselves constant, and can be injected into the code that uses them. Here’s that previous example, written a different way,

function getGroceries(bool $likesPeas): string {
  if ($likesPeas) {
    return purchasePeas();
  }
  return purchaseCarrots();
}
// ...
echo "I will buy some " . getGroceries(defined('LIKES_PEAS'));

Suddenly it’s very easy to test the function! We just change its input. This is called dependency injection.

This technique is a time honored way to isolate dependence from logic. It may be less convenient because we must admit that when we need to check a constant we are actually adding a new dependency. However, noticing this gives us information, not penance. It may force us to find other ways of structuring our logic to avoid those dependencies altogether.

There is no doubt in my mind that the isolation of dependencies is critical to write code that is easy to reason about, and that doing so will reduce bugs, stress, and confusion as the code grows. When we recognize that constants created using define() are just another dependency, we can use them sparingly and isolate their usage in the same way we would for any other global or remote state which we cannot control.

(Photo by Simon Matzinger on Unsplash)

Categories
Uncategorized

Creating Sniffs for a PHPCS Standard

As a follow-up to my summary of using phpcs to lint your PHP code, this post will explain how to create, modify, and test a phpcs standard.

As a quick refresher, phpcs organizes its linting messages (warnings and errors) into “sniffs”, then into “categories”, then into a “standard”.

The official tutorial for creating a standard is here, but it doesn’t go into much detail and some parts of it are out-of-date. There’s additional useful information in the 3.0 upgrade guide but I’m going to try to do a better job here of explaining how it all works.

Categories
Uncategorized

Linting PHP with phpcs sniffs

Linting is the process of using an automated tool to scan your code for problems before you commit and deploy. It is a practice widely used in the development workflow of many languages, but hasn’t much been used in the PHP of WordPress developers.

The most commonly used linter in PHP right now is called “PHP Code Sniffer”, or phpcs for short. In this post I’ll show you how to install it in a project folder, configure it, and set up your editor to display its messages.

Sniff Codes

To phpcs, an “error” or a “warning” is generated by a function that checks PHP code for certain conditions.

phpcs calls a set of related errors or warnings a “sniff”.

Sniffs can be grouped together into a “category”.

phpcs calls a collection of categories a “standard”.

To make multiple sniff standards available in a single project, install the composer package dealerdirect/phpcodesniffer-composer-installer. That will automatically use all installed standards in the current project with the composer type phpcodesniffer-standard when you run phpcs.

To install that package in a PHP project, run the following (assuming you already have composer installed).

composer require --dev dealerdirect/phpcodesniffer-composer-installer

Each error or warning in a sniff has a “sniff code” which identifies it explicitly. A full sniff code includes the standard, the category, the sniff, and the error or warning, all separated by periods.

Standard.Category.Sniff.Error

For example:

MyStandard.MagicMethods.DisallowMagicGet.MagicGet

It’s possible to refer to all the errors in a sniff by omitting the error name. It’s further possible to refer to all the sniffs in a category, or the categories in a standard, but omitting those fields as well.

Therefore, these are all valid codes:

MyStandard.MagicMethods.DisallowMagicGet.MagicGet
MyStandard.MagicMethods.DisallowMagicGet
MyStandard.MagicMethods
MyStandard

Built-in Standards

phpcs comes with several standards built-in. You can see them on the command-line by running phpcs -i. These include PEAR, PSR1, PSR2, Squiz and Zend.

WordPress has its own standard which you can install with composer as well.

composer require --dev wp-coding-standards/wpcs

I highly recommend the VariableAnalysis standard which looks for undefined and unused variables.

composer require --dev sirbrillig/phpcs-variable-analysis

Configuring Standards

When installing sniff standards in a project, you edit a phpcs.xml file with the rule tag inside the ruleset tag. The ref attribute of that tag should specify a standard, category, sniff, or error code to enable. It’s also possible to use these tags to disable or modify certain rules. The official annotated file explains how to do this.

For example, this would enable all the sniffs in a standard called MyStandard:



  My PHP project.
  

The following configuration will enable all the sniffs in MyStandard, VariableAnalysis, and WordPress.



 My PHP project.
 
 
 

Configuring An Editor

To run phpcs on a file in your project, just use the command-line as follows (the -s causes the sniff code to be shown, which is very important for learning about an error).

vendor/bin/phpcs -s src/MyProject/MyClass.php

When it runs, phpcs will report any errors or warnings it finds. That’s pretty handy, but I find that it’s even more valuable to have linters run directly in your editor so that you can see problems as you code. Pretty much all editors support this, but the method of installation depends on your editor.

For vim, I recommend you install the ALE plugin. This has phpcs support already built-in and will use the standards you have configured in your project.

For phpstorm, the linting support is built-in, but you have to configure the app to know where to look. This guide shows how to set the path and enable the inspections. The guide neglects to mention that the path setting is under Preferences > Languages & Frameworks > PHP > Code Sniffer, and that the path should be set to your local installation of vendor/bin/phpcs. It also neglects to say that the inspections are under Preferences > Editor > Inspections and that you have to set the “Coding standard” to your phpcs.xml file manually (you may also want to disable some of the existing Inspections).

For vscode, I suggest installing the vscode-phpcs extension. It will automatically detect and use your local phpcs and configuration for your project. However, note that before version 1.0, the sniff code could not be shown.

Ignoring Rules

Sometimes you’ll need to ignore or disable one of the sniffs in a standard. The official documentation explains how to do this. You can surround or prefix your code with special comments to instruct phpcs to ignore it. Note that prior to version 3.2, you could not disable specific sniffs in this way; phpcs would ignore all of them for the marked lines.

$xmlPackage = new XMLPackage;
// @codingStandardsIgnoreStart
$xmlPackage['error_code'] = get_default_error_code_value();
$xmlPackage->send();
// @codingStandardsIgnoreEnd
$xmlPackage = new XMLPackage;
// @codingStandardsIgnoreLine
$xmlPackage['error_code'] = get_default_error_code_value();
$xmlPackage->send();

In the recent version 3.2 of phpcs, this feature was greatly improved. The above special comments were replaced with // phpcs:disable/// phpcs:enable and // phpcs:ignore respectively. You can also use // phpcs:ignore on the same line as a statement.

Even better, you are able to disable specific rules with either form by listing the sniff codes (comma-delimited) at the end of the special comment. For example, this following is possible.

doSomethingBad(); // phpcs:ignore MyStandard.Functions.PreventBadThings

You can even include comments in the disabling line!

doSomethingBad(); // phpcs:ignore MyStandard.Functions.PreventBadThings -- We are ok with bad things here

If there are specific sniffs you wish to ignore for your entire project, you can configure them in the phpcs.xml file and set their severity to 0. Here’s an example of disabling the WordPress function comment rule (standards can contain other standards, so some WordPress rules are actually part of the Squiz standard).



 My PHP project.
 
 
 
     0
 

In a later post, I explain how a phpcs standard is built and how to create and modify your own. Until then, see if you can find some standards that you like and use them! Here is the phpcs.xml file I typically use right now for non-WordPress projects.


<?xml version="1.0"?>
<ruleset name="PaytonsStandard">
<description>
Originally from https://gist.github.com/Ovsyanka/e2ab2ff76e7c0d7e75a1e4213a03ff95
PSR2 with changes:
– tabs instead of spaces (https://gist.github.com/gsherwood/9d22f634c57f990a7c64)
– bracers on end of line instead new line
– unused/undefined variable detection (https://github.com/sirbrillig/phpcs-variable-analysis)
</description>
<arg name="tab-width" value="4"/>
<rule ref="PSR2">
<exclude name="Squiz.Functions.MultiLineFunctionDeclaration.BraceOnSameLine" />
<exclude name="PSR2.Classes.ClassDeclaration.OpenBraceNewLine" />
<exclude name="Generic.WhiteSpace.DisallowTabIndent"/>
</rule>
<rule ref="Generic.WhiteSpace.DisallowSpaceIndent"/>
<rule ref="Generic.WhiteSpace.ScopeIndent">
<properties>
<property name="indent" value="4"/>
<property name="tabIndent" value="true"/>
</properties>
</rule>
<rule ref="Generic.Functions.OpeningFunctionBraceKernighanRitchie" />
<rule ref="Generic.Classes.OpeningBraceSameLine"/>
<rule ref="VariableAnalysis.CodeAnalysis.VariableAnalysis"/>
</ruleset>

view raw

phpcs.xml

hosted with ❤ by GitHub

(Photo by Noel Lopez on Unsplash)

Categories
Uncategorized

Maybe returning errors in PHP

A common pattern I see in WordPress PHP code (and probably other PHP code) is a function which does some operation on data and returns a value. For example, a function which makes a database query and then returns the resulting row.

In order to make more robust code and prevent bugs, I’ve been systematically trying to use more return type declarations in my functions. This is a problem for things like the database function described above because a database query (and therefore the function itself) can sometimes fail. In those cases, the function might return the intended data, or it might return an instance of WP_Error instead. PHP does not currently (as of 7.1) support the idea of declaring return types when multiple types are possible.

Of course, such a function could simply throw an Exception if it fails, but this might be not as safe for critical code. An unhandled Exception, even a trivial one, could kill the whole program. Returning an error object instead could keep the program running (depending on how it is used). Even if we do throw an Exception, it’s not always clear when reading the function that it can throw.

So there are two problems now. First, since the function can return multiple things, we cannot specify any return type at all.

What I would need to solve that is the ability to specify multiple return types like phpdoc, but even if I could, it doesn’t guarantee that the user of this function will notice and handle potential errors.

This leaves us with the second problem, which is the same as if the function could throw an Exception; it can be easy for someone using it to overlook that the function can fail.

Is there a solution to both these problems? Maybe. (That’s a poor joke, as you’ll soon see.)

Let’s take the following function as an example. This function accepts an array of user data and returns the color property.

function getColorFromUser(array $user): string {
  return $user['color'];
}

It’s contrived, but it’s a reasonable simplification of the database query problem, since the array property might not exist, and therefore it can fail.

So now let’s modify the function to return an error if it fails.

function getColorFromUser(array $user) {
  if (! isset($user['color']) {
    return new WP_Error('NO_COLOR', 'Color does not exist');
  }
  return $user['color'];
}

As you can see, we’ve lost both the return type and also made it possible to introduce bugs in code that calls this function if they expect a string and instead get an object.

$myColor = getColorFromUser($user);
echo 'My favorite color is ' . $myColor; // This could cause a problem if $myColor is not a string

The standard way to handle this situation is similar to using a try/catch for an Exception; if we are aware that the function can return an error, we can check for one before continuing.

$myColor = getColorFromUser($user);
if (! is_wp_error($myColor)) {
  echo 'My favorite color is ' . $myColor;
}

It’s just that in my experience it’s common to miss the fact that a function can return an error. It’s even worse if a function once returned only a value but is subsequently changed to be able to return errors. Since the normal (successful) flow of the code will not change, no change is actually required anywhere the function is used. This makes type errors not only possible, but likely.

But what if we instead returned a wrapper object around the data? Let’s call it a Maybe. (See? There’s the joke.)

function getColorFromUser(array $user): Maybe {
  if (! isset($user['color']) {
    return Maybe::fromError(new WP_Error('NO_COLOR', 'Color does not exist'));
  }
  return Maybe::fromValue($user['color']);
}

Now we can have a return type, which can help prevent bugs inside the function. It’s not perfect as it doesn’t explain what types of values the object can contain, but it’s better than nothing. (To be fair, if we wished, we could create specific versions of Maybe for different values, like MaybeString, but I’m just going to keep it simple for now.)

Your first reaction might be that it also creates an inconvenience for the user of this function, because in order to extract the value of this data, someone has to do it explicitly.

But that small inconvenience means the developer must know about the possible error.

The example might then look something like the following. The condition required is similar to using is_wp_error(), but it’s more probable that the developer will remember to add the condition because they are forced to use getValue() on success.

$myColor = getColorFromUser($user);
if (! $myColor->isError()) {
  echo 'My favorite color is ' . $myColor->getValue();
}

This is not a perfect solution, and I’m certain that many developers will balk at having to add a pattern like this to their workflows when it’s not a standard part of the language. Personally I think it would be worth the discomfort to adopt this sort of construct when the benefit would be code that’s significantly less likely to have type errors. It probably depends on how careful you need your code to be. What do you think?

PS: It’s perhaps relevant to mention that there are functional programming concepts that are similar to this implementation, but which take it much further. My suggestion here is a very minimal interpretation that might be easier for some developers to grok. A more accurate name for this concept, matching existing patterns, is a Result.

Here’s how the Maybe/Result could be written:


<?php
class Maybe {
private $value;
private $error;
private function __construct($value, $error) {
$this->value = $value;
$this->error = $error;
}
public static function fromError($error): Maybe {
return new Maybe(null, $error);
}
public static function fromValue($value): Maybe {
return new Maybe($value, null);
}
public function isError(): bool {
return $this->error === null;
}
public function getError() {
return $this->error;
}
public function getValue() {
return $this->value;
}
}

view raw

Maybe.php

hosted with ❤ by GitHub

(Photo by Caleb Jones on Unsplash)