Finding & patching a bug in php-src

How to find and patch a bug in PHP source :: PHP internals

Oct 10, 2017

Fair warning: take all this with a grain of salt; I'm still a PHP-internals newbie, but I wanted to show you a behind-the-scenes look behind bug #75237.

Find the bug

It all started whilst looking for lines of code to cover for a Chicago PHP UG meetup which was participating in PHP TestFest 2017.

I found an uncovered line using the gcov website for ext/json/json_encoder.c. I was trying to write some PHP that would hit this line.

json_encoder.c

I was trying various things to try and hit that line and nothing was working and then suddenly I inadvertently triggered a segfault with the following code.

<?php
class Foo implements JsonSerializable {
  public function jsonSerialize() {
    return new self;
  }
}

# Segfault!
var_dump(json_encode(new Foo));

Segfaults should never happen in PHP; the PHP core code should throw an exception and deliver a nice error message to userland before a segfault is able to occur.

Just to see which versions of PHP were affected by this bug, I created an 3v4l snippet and saw that all actively supported versions of PHP were segfaulting.

This segfault only happens when jsonSerialize() returns a new instance of itself. If I return the same instance with $this, the code works as expected.

<?php
class Foo implements JsonSerializable {
  public function jsonSerialize() {
    return $this;
  }
}

# Works fine
var_dump(json_encode(new Foo));

Submit a bug report

All bugs in PHP core should be reported via bugs.php.net. Even though I was going to attempt to patch the bug myself, I still needed to create a bug report to reference in my patch later on.

So I filled out the report a bug form and bug #75237 was born.

Make a patch

Now that hard part. How do we begin even patching this thing?

Update php-src and recompile

First, I pulled in any upstream changes from the main git repo to make sure I was working with the most up-to-date version of php-src. I have a remote called upstream that points to the main php-src repo on GitHub.

$ git fetch upstream
$ git checkout master
$ git rebase upstream/master master

Then I created a new branch off of master for the patch and named it after the bug ID.

$ git checkout -b bug75237

Then I needed to recompile from source. I won't go into too much detail on this, but I do have another post that describes in more detail how to compile PHP from source.

Whenever I recompile PHP from source, I run a bash script called config that I keep outside of the main php-src folder. It deletes all the existing compiled binaries and other untracked files, builds the configure script, and runs configure with all the default flags I like.

Here's my config script if you want to make something similar for yourself.

#!/bin/sh

make distclean
./vcsclean
./buildconf
./configure --enable-maintainer-zts \
	--enable-debug \
	--enable-cli \
	--with-zlib \
	--enable-mbstring \
	--with-openssl=/usr/local/opt/openssl \
	--with-libxml-dir=/usr/local/opt/libxml2 \
	--with-sodium \
	"$@"

Why the weird paths? I'm on a Mac with some dependencies installed with Homebrew so that's why some of the paths are pointed to /usr/local/opt for some of the extensions.

So I just ran my custom config script & then compiled everything with make.

$ ../config && make -j9

This should create a compiled binary at sapi/cli/php so I checked the version of PHP and ran the code we created above that causes a segfault to make sure the latest code from the master branch of php-src still had the bug.

$ sapi/cli/php --version
$ sapi/cli/php json_encode.php

Yep! Still segfaults. Now we need to use a debugger to step through the C code that's running our PHP script line by line.

Run GDB

I used GDB to step debug through the code. GDB is designed for compiled languages like C and does not work on plain-old PHP files. But PHP is written in C, so you can make GDB run the compiled binary with your PHP script using gdb --args.

$ gdb --args sapi/cli/php json_encode.php

Once in GDB, I set up a breakpoint to pause the execution of the program and then step through each line of the C code. I go into more detail in the video above, but TL;DR: I found that the main function of interest was php_json_encode_serializable_object() so I set a breakpoint there and then ran the program with run.

> break php_json_encode_serializable_object
> run

Once GDB pauses the code execution at a breakpoint, we can type n to jump over the next line and s to step into the scope/frame of the next line. We can also run c to run until we hit the next breakpoit.

After typing a whole lot of n's and s's, I realized that the code was going into infinite recursion between php_json_encode_zval() and php_json_encode_serializable_object(). This lead to a stack overflow and caused a segfault because the program ran out of memory.

The main code of interest was this if statement inside of php_json_encode_serializable_object().

if ((Z_TYPE(retval) == IS_OBJECT) &&
	(Z_OBJ(retval) == Z_OBJ_P(val))) {
	//
} else {
	//
}

There was a case to handle the return value of jsonSerialize() if it was the same instance of itself, but nothing was catching if it returned a new instance of itself.

Again, I go into more detail on this in the video above, but here's the final bit of code I added to throw an exception when jsonSerialize() returns a new instance of itself.

if ((Z_TYPE(retval) == IS_OBJECT) &&
	// Make sure the objects are not the same instance
	(Z_OBJ(retval) != Z_OBJ_P(val)) &&
	// Check that the class names of the objects are the same
	zend_string_equals(ce->name, Z_OBJCE(retval)->name)) {
	// Throw an exception
	zend_throw_exception_ex(NULL, 0, "%s::jsonSerialize() cannot return a new instance of itself", ZSTR_VAL(ce->name));

	// Garbage collection
	zval_ptr_dtor(&retval);
	zval_ptr_dtor(&fname);
	PHP_JSON_HASH_APPLY_PROTECTION_DEC(myht);

	// Return fail
	return FAILURE;
} else if ((Z_TYPE(retval) == IS_OBJECT) &&
	(Z_OBJ(retval) == Z_OBJ_P(val))) {
	//
} else {
	//
}

I did a lot of trial and error to get to that point, but phpinternalsbook.com was a really great reference, especially for dealing with zvals.

Update: Long-time PHP core developer & one of the release managers for PHP 7.2 Sara Golemon gave some excellent feedback on the patch: "If you're just trying to match the class, you can compare the ce only. This will be more correct than comparing names and (nominally) more performant."

So with Sara's suggestion, we can change the zend_string_equals() line in our if statement:

if ((Z_TYPE(retval) == IS_OBJECT) &&
	(Z_OBJ(retval) != Z_OBJ_P(val)) &&
	// This line changes
	ce == Z_OBJCE(retval)) {
	// Same as above
} else if ((Z_TYPE(retval) == IS_OBJECT) &&
	(Z_OBJ(retval) == Z_OBJ_P(val))) {
	//
} else {
	//
}

Make a test

Before we can submit the patch back to internals, we need to write a test to confirm that the bug is indeed fixed.

I won't get into too much detail on this, but I created a 6-part screencast on writing tests for PHP source if you would like more information.

$ vi ext/json/tests/bug75237.phpt
--TEST--
Bug #75237 (jsonSerialize() - Returning new instance of self causes segfault)
--SKIPIF--
<?php if (!extension_loaded("json")) die("skip ext/json required"); ?>
--FILE--
<?php
class Foo implements JsonSerializable {
  public function jsonSerialize() {
    return new self;
  }
}

try {
  var_dump(json_encode(new Foo));
} catch (Exception $e) {
  echo $e->getMessage();
}
?>
--EXPECT--
Foo::jsonSerialize() cannot return a new instance of itself

I ran the test to confirm it passed, then I ran all the tests in ext/json to make sure I didn't break anything else.

$ make test TESTS=ext/json/tests/bug75237.phpt
$ make test TESTS=ext/json/tests/

All the tests passed, so I was ready to submit the patch back to PHP internals.

Submit a PR and update the bug

From there, I just pushed my patch up to my fork on GitHub...

$ git add ext/json/json_encoder.c ext/json/tests/bug75237.phpt
$ git commit -m "Fix bug 75237 - (jsonSerialize() - Returning new instance of self causes segfault)"
$ git push origin 

...and then created a PR in the main php-src repo.

Finally, I went back to the bug report I created and updated it with a reference to the PR I just submitted.

Await feedback and make any changes

After sending the PR, it didn't take long to get feedback from several internals folks. And one helped me realize I was comparing the class names incorrectly. (I fixed that in the video series and in the snippets in this post, but left the original PR unchanged so you can still see my original attempt.)

After some back & forth, Niki P finally closed the PR because we uncovered a deeper, more global problem with PHP. PHP basically has no global stack overflow protection and therefore segfaults can be created in lots and lots of other contexts.

So basically, we just had one of these moments.

Digging Deeper

If you want to learn more about why this specific bug exists in PHP & why the fix is non-trivial, I picked Sara Golemon's mind about it in episode 068 of the PHP Roundtable: PHP's Dirty Little Segfault Secret.

Good luck!

Although my patch got closed without merging, I still consider it a success because not only did I learn a lot in the process, but the main stack overflow issue was brought to the attention of the internals devs once again adding some "squeak to the wheel," which might help the issue get addressed sooner than later.

I hope you've enjoyed this bug-squashing journey with me. I wish you the best of luck when hunting and patching your own bugs in PHP source. Until next time!

If you found this guide helpful, say, "Hi" on twitter! I'd love to hear from you. :)