Network filesystem is a vital component of almost any infrastructure. As always, there are many options to choose from. Each network filesystem comes with its own pros and cons, and choosing the most appropriate one can make a difference between smooth operations and endless nightmares.
Recently, we had an opportunity to take a fresh look at modern network filesystems, and evaluate them for some new scenarios. And we gladly did.
Without diving into too much details on the requirements, we had to choose an option that would be acceptable for hosting PHP projects.
Modern PHP is a great language for web development, but it brings its own fair share of challenges. Since the introduction of namespaces, autoloaders, and composer, many PHP projects exploded into thousands and thousands of tiny files. In order to serve a single web request, PHP often needs to read and process hundreds of these files before it can generate a response. Storing PHP files on a network filesystem is not the greatest idea, but sometimes it is still the most viable option due to other constraints.
Hosting of PHP code on a network filesystem means that we are more interested in the performance of the read operations, rather than write operations, and more so on the small files – a few bytes to a few kilobytes – rather than larger files like images and videos.
Firstly, we looked at some of the distributed network filesystems. While there was no strict requirement for our choice to be a DFS, we thought it’d be nice to have a fault-tolerant, decentralized solution. There are many options to choose from and we considered a few:
Some of these were quite complex to setup. Some weren’t mature enough. Some weren’t offering the performance level that we required. And some displayed spectacular stability issues, with freezes, split-brains, and an occasional data loss.
Of all of them, Lustre was the closes to what we needed. It also had one more additional benefit, compared to the others – Amazon AWS offers Lustre as a managed services.
We’ve tried both the Amazon managed Lustre as well as our own setup. It was easy, fast enough, and somewhat stable. Yet we did experience occasional freezes. And we had better options, performance-wise.
Once we gave up on the distributed network filesystems for this particular experiment, we looked at a few other options, which involved synchronizing files and directories between several servers. Not a very elegant approach, but performance-wise local filesystem beats a network filesystem any day of the week. An example of such setup would be rsync + inotify. We found about a million other options, but eventually it was a dead end. With more and more servers, multiple sources of truth, and variations in projects, this approach is just too error prone and maintenance-heavy.
After a few weeks of trying, testing, and benchmarking, we arrived at the good old NFS. There were quite a things to experiment here as well. We looked at NFS + DRBD, pNFS, Amazon EFS, and a few others.
Eventually, our own NFS setup did just fine. It was simple, familiar, straight-forward, and offered an acceptable performance. It does seem strange that we have spent a lot of time and effort looking at all the different options, and yet arrived at something that is probably the most common option to consider. But we are glad we did it anyway. We’ve learned a lot about the modern day challenges and solutions in network filesystems, performance optimization, networking, and more. The trip was well worth it.
Once we’ve set our eyes on the NFS, there were a few additional bits that helped us with performance improvements:
- cachefilesd daemon that helps with persistent caching of the NFS. Read this article for more details.
- PHP configuration tweaking for real path caching. Read this blog post for more.
- PHP OpCache configuration tweaking. Read this blog post for some more details.
Also, if you are setting up NFS as a network filesystem for the autoscaling group, make sure that each host in the group is provisioned with the unique hostname. This will help a lot with the NFS caching and avoid polluting the logs with unnecessary messages.
While we’ve made our choice for this particular experiment, your mileage may vary. Here are a few resources that we found helpful a long the way:
- Jeff Geerling: Getting the best performance out of Amazon EFS
- StackOverflow: Distributed File Systems: GridFS vs. GlusterFS vs Ceph vs HekaFS Benchmarks [closed]
- ServerFault: NAS Performance: NFS vs Samba vs GlusterFS
- Gentoo Forums: Cluster filesystem for HPC
- Modern PHP applications up to 110x slower on network storage
- Gluster-users mailing list: GlusterFS tweaks and PHP OpCache
- High performance Drupal: Chapter 10. File Storage for Multiple Web Servers
- Experience with GlusterFS
We are always interested in learning more, so please do let us know which network filesystem you run and how well it works for you.