The Essentials of a Contributor-friendly Open-source Project
Don’t drive your contributors Away
You have an amazing open-source project with a good number of users. Many of them are trying to contribute code to help you improve your project. However, they soon give up because your project’s setup is not welcoming. Don’t let this happen to your project!
As a software engineer for Keras, I have been working on improving Keras’s open-source contributing experience since Keras moved into its standalone GitHub repository in 2021. In this article, I will introduce you to the essential setups that lead to a perfect open-source contributing experience.
Why Do I Want People to Contribute?
Open-source contributions lead to product excellence. There are tons of small improvements you can make to your software, and you cannot do everything on your own. Open-source contributors may help you fix bugs and create new features. They have the incentive to do this since they will have a better experience using your software.
Moreover, a strong community gains trust in the product. Even a small contribution, your responsiveness gives them confidence in your product because they see it is evolving and the issue can potentially be resolved in the future.
Based on this trust, they may build new things upon your software. An entirely new ecosystem may be built up this way, leading to your software’s real prosperity.
The Million-Dollar Metric
I estimate the engineering resources we are willing to spend to optimize this metric would be worth one million dollars. This metric is like Polaris pointing us in the right direction in improving the contributing experience, which is: the average life span of the pull requests.
The metric is pretty intuitive. The faster your pull request gets merged, the better the experience is. The focus of the developers is always on the intellectual parts during the pull request. Let’s compare a good and a bad contributing experience, and see if the metric works or not.
What does a bad contributing experience look like? The author made some changes that don’t follow the code style or break a bunch of unit tests. The owner spent several rounds of code review just to get these basic things right.
What does a good contributing experience look like? Everything just works. It is very easy for the contributor to set up the development environment to run the unit tests and code style checks. When creating the pull request, the continuous integration (CI) did not take too long to finish. Therefore, when the contributor requests for review. Everyone can focus on the important things instead of the minutiae.
The proposed metric can differentiate the two scenarios. The bad experience’s pull request took much longer than the good experience’s pull request because of the extra rounds of code reviews.
Another important observation we can draw from above is that as long as you do the setups and have clear guidance, the contributing experience should be good.
How do you do the right setups for your project? Following is a checklist of all the things you need to optimize for this metric. Just by doing all the things on the checklist, your project’s contributing experience would be good enough.
Development Environment Setup
Most of the contributors are one-time contributors. It is not worth it to spend a lot of time in setting up a development environment. To optimize the contributing experience for these users, you have to make the process of setting up your development environment simple enough.
Our trick is to support GitHub Codespaces, which provides a web-based Visual Studio Code IDE. The best thing is you can specify a Dockerfile with all the required dependency software installed. With one click on the repo’s webpage, your contributors are ready to code. Here is our setup for your reference.
For those who are willing to spend time customizing their development environment, we also provide very clear instructions on how to set up the environment locally.
Running the Tests
There are two ways that a contributor would like to run the tests. Run a specific test or run all the tests.
When contributing code the contributor would usually need to write some unit tests. They would want to easily run a specific test many times during their development to debug their code.
Before creating the pull request, some contributors would also like to run all the tests locally to make sure everything works in their pull request. The running settings should be as similar to the CI run as possible since the goal is just to pre-testing the CI.
Therefore, it is really important to support these two cases of running tests well. Clear and detailed instructions should be provided for running a specific test and for running all the tests.
Continuous integration (CI) Time
Ideally, the CI should finish in minutes or at most a couple of hours. The sooner the contributors get the CI results the sooner they can start working on it again if failed. Imagine if your CI takes days to finish, the contributors may already have forgotten what the pull request was about when they received the results.
The first way to shorten your CI is to make your tests run faster. For example, you can try to run the tests in parallel, or just rewrite some of your slowest test cases to run faster.
Another way to shorten your CI is to run less number of tests by splitting the repo into multiple ones, which is exactly what we are doing with TensorFlow. The CI usually takes longer for larger projects, because compiling a large codebase takes longer and there are more tests to run. Therefore, you may consider splitting your repo into multiple ones if the code can be decoupled.
This is where we spent our resources to optimize the metric we mentioned above. In 2021, we have successfully finished the split of Keras and TensorFlow on GitHub. It is non-trivial work as you can imagine. It involves a lot of code decoupling and refactoring. After that, CI takes less than 20 minutes to run, which is much shorter compared with before.
Code Style Check
The code style check is important for a project to ensure its code quality. If your code follows certain style guidelines, it is important to make it clear to the contributors.
The worst case is like this. The contributors do not know how to run the automated code style check. The only way they can do it is through the code review. The code reviewer catches the code style issues and reported back to the contributor. It takes several rounds of code reviews just to get the code style right, which is extremely inefficient.
The code style checks should be automated whenever possible. In the CI, we use tools like Flake8 to catch style violations. We also use Black, a code auto-formatter, to format the code. The contributor can just use the auto-formatter instead of fixing all the style issues by themselves.
For the parts of the code that is hard to auto-format, like the Python docstrings, a clear style guide should be provided. The contributors can easily follow the guide to avoid leaving the issues to the code reviews.
Besides the above, we took it one step further for Keras. We integrated the code style checks into the GitHub Codespaces, which marks all the style violations in the editor and autoformats the code on save.
The Contributing Guide
Finally, everything a contributor should know should be summarized into a contributing guide, including almost everything we mentioned above and more.
It should first be clear about what types of contributions are welcome. Refer to a list of contribution-welcome issues if you have one. You may also mention the special cases. For example, any change to the APIs would need to go through the Request for Comments (RFC) process first.
Then, it should describe the general process of contributing code, for example, creating a fork of the repo, creating a pull request, waiting for the CI to pass, and so on.
It should include clear instructions for each of the aspects we discussed above, including how to set up the environment, how to run the tests, and how to check your code style.
Here is our contributing guide as an example.
Conclusion
In this article, we first introduced the importance of having open-source contributions for your software. You need open-source contributions to reach prosperity.
Then, we introduced the million-dollar metric for improving your open-source software contributing experience, which is the average life span of the pull requests. The faster it gets merged the better the contributing experience is. You only need to do the project setups right to optimize for the metric.
Finally, a list of actionable items is provided to optimize the metric.
Hope this article could help improve your open-source software and attract more contributors!