Automattic developers have always held the WordPress PHP coding guidelines as the basic advice for writing WordPress.com code. However, when I started in 2013 most of the code I actually encountered in the codebase did not follow those guidelines.
When I transitioned to my current team around 2016, the codebase had no automated linting at all. Having seen the way that standards in our JavaScript codebase had reduced bugs and improved productivity, I wanted to bring this feature to our backend. There are various linting tools available for PHP but possibly the most well-known is PHP_CodeSniffer, or phpcs. My team and I spent a while going through all of the code we owned and cleaning it up to conform to our standards, one directory at a time.
This worked pretty well, and soon other teams wanted to have enforced coding guidelines for their directories. But we had a problem because we couldn’t do a cleanup like the one done for our team’s directories to all of the WordPress.com backend. There were roughly sixty thousand files that contained many thousands of linter errors, most of which would require slow manual investigation.
So how could we move forward?
I realized that we didn’t actually need to see all the errors on a day-to-day basis in order to start enforcing a coding standard. To start with we mainly wanted to prevent new errors. What I wanted was a way to only report new linting errors and ignore any errors that already existed. There didn’t seem to be any way to do that using phpcs itself, so I thought to myself, “surely it can’t be that difficult”, and in November of 2018 I set out to make a version myself (based on an idea from one of my co-workers).
Making a comparison engine
The basic concept is pretty simple. You run the linting twice for each file: once on the file before any changes are made, and once again after the changes. Then you display only the linting errors present in the second run that were not present in the first.
For example, if the file contained this error before any local changes had been made:
-----------------------------------------------------------------------------------------------
76 | WARNING | Variable $foobar is undefined.
-----------------------------------------------------------------------------------------------
Then after the changes it contained these errors:
-----------------------------------------------------------------------------------------------
76 | WARNING | Variable $foobar is undefined.
79 | WARNING | Variable $helloworld is undefined.
-----------------------------------------------------------------------------------------------
What we wanted to see was only the error added by the changes:
-----------------------------------------------------------------------------------------------
79 | WARNING | Variable $helloworld is undefined.
-----------------------------------------------------------------------------------------------
I called the tool phpcs-changed because it was designed to have phpcs report only on the changes from one version of the file to another.
When I originally wrote the tool, we were using subversion (svn) for version control of our files, so it was pretty straightforward to ask svn to use the code as it existed before changes to get the first linter output. Then we could run the linter on the current version of the file and compare the results.
The first version of the script actually required you to manually collect that data and pass it in:
svn cat file.php | phpcs --report=json -q > file.php.orig.phpcs
cat file.php | phpcs --report=json -q > file.php.phpcs
phpcs-changed --phpcs-unmodified file.php.orig.phpcs --phpcs-modified file.php.phpcs
The main difficulty was that linting errors are specific to a line, and the lines of code may have changed between the two files.
If a file and its phpcs output previously looked like this:
<?php
function sayHello() {
return $hello; // Error: undefined variable
}
-----------------------------------------------------------------------------------------------
3 | ERROR | Variable $hello is undefined.
----------------------------------------------------------------------------------------------
And then the file changed to this:
<?php
function sayHello() {
error_log("sayHello called");
return $hello; // Error: undefined variable
}
-----------------------------------------------------------------------------------------------
4 | ERROR | Variable $hello is undefined.
----------------------------------------------------------------------------------------------
You can see that the error was previously on line 3 and now is on line 4, and therefore simply comparing the phpcs output of both files is not enough to tell if the error is new or not. The error message is the same, but is it a new error or the same one?
To tell if an error was the same between two runs required me to be clever.
I created a class called DiffLineMap which was able to parse diff hunks (unified) to map between the old line numbers and the new line numbers. Using this and a diff between the old and new version of the file (with special cases for added or removed files), it was possible to determine if an error was new or already existed.
So we needed to also provide a diff of the two files to phpcs-changed:
svn diff file.php > file.php.diff
svn cat file.php | phpcs --report=json -q > file.php.orig.phpcs
cat file.php | phpcs --report=json -q > file.php.phpcs
phpcs-changed --diff file.php.diff --phpcs-unmodified file.php.orig.phpcs --phpcs-modified file.php.phpcs
For convenience, I coded phpcs-changed to run svn commands itself to generate the diff map and compare the linting output. It used DiffLineMap to make sure to report errors only present in the new version of the file. All you needed to provide was the --svn option and the work would be done automatically.
phpcs-changed --svn file.php
Writing a library
I actually ended up writing all this logic as a standalone PHP library, usable by anything, that exports the function PhpcsChanged\getNewPhpcsMessages(). It takes three arguments:
- (string) The full unified diff of a single file.
- (
PhpcsMessages) The messages resulting from running phpcs on the file before your recent changes. - (
PhpcsMessages) The messages resulting from running phpcs on the file after your recent changes.
PhpcsMessages represents the output of running phpcs. You can create an instance of PhpcsMessages from phpcs JSON output by using PhpcsMessages::fromPhpcsJson().
The function will return an instance of PhpcsMessages which is a filtered list of the third argument where every line that was present in the second argument has been removed.
use function PhpcsChanged\getNewPhpcsMessages;
use PhpcsChanged\PhpcsMessages;
$changedMessages = getNewPhpcsMessages(
$unifiedDiff,
PhpcsMessages::fromPhpcsJson($oldFilePhpcsOutput),
PhpcsMessages::fromPhpcsJson($newFilePhpcsOutput)
);
echo $changedMessages->toPhpcsJson();
This library made the features available to any PHP code that needed them. Then I used that library in an executable called phpcs-changed. For convenience I kept many of the same options and the same output formats of phpcs itself.
Git presents a wrinkle
A few years later, we began to transition from Subversion to Git as our version control system and I had a new challenge: the concept of “the previous version of a file” is a little more nuanced in git than svn.
For uncommitted code, the previous version of a file is the same as it was for svn. When running phpcs-changed as a CI job on a branch, however, the current version of the file was actually the most recent commit and the previous version was the commit prior to that one. Finally, sometimes we actually want to compare the current branch with a different one entirely.
I had to implement three modes for git to support each of these cases:
--git-staged compares the currently staged changes (as the modified version of the files) to the current HEAD (as the unmodified version of the files).
--git-unstaged compares the current (unstaged) working copy changes (as the modified version of the files) to the either the currently staged changes, or if there are none, the current HEAD (as the unmodified version of the files).
--git-base, followed by a git object, compares the current HEAD (as the modified version of the files) to the specified git object (as the unmodified version of the file) which can be a branch name, a commit, or some other valid git object.
By abstracting the process of collecting the pieces of information we needed into an interface called ShellOperator, I was able to add Git support without too much trouble and make way for the future support of different commands or shells to collect the data.
Still going
It worked! With this tool, we had linting that would ignore the thousands of existing errors and only report when something new appeared. That allowed us to integrate the phpcs WordPress Coding Standard (WPCS) into our automated testing system for WordPress.com, giving Automattic its first globally-enforced PHP coding standard! It continues to prevent new linting errors today and gives us time to slowly fix the ones that exist.
I should note that many phpcs standards support auto-fixing (see the phpcbf tool that comes with phpcs), which makes it pretty easy to resolve a large number of linting errors without any trouble at all, but there’s always going to be errors that need a human to figure out, and those are the ones that I wrote this script to ignore. If you can auto-fix all your errors, then you probably don’t need it.
If you work in a big, messy PHP codebase and have always wanted to add new coding standards without having to immediately update every file, give phpcs-changed a try and let me know how it goes!
Photo by Free Nomad on Unsplash

Leave a comment