On December 16, Facebook announced the release of open-source code the company says significantly sped up its internal artificial intelligence and machine learning projects.
Its release is part of an open-source push that's seen Facebook share hundreds of projects on code repository GitHub, from database software like the Presto big data SQL system to programming language tools such as HHVM, for speedily running PHP code, to web front-end tools like the React JavaScript library.
"The reasons are numerous—some of them are sort of ideological—we obviously built Facebook from the start on other people's open-source software," says James Pearce, the company's open-source lead. "So on some level we have an obligation to share back."
But sharing software isn't just an altruistic move, he readily acknowledges.
Publishing useful and impressive code helps boost the company's reputation in the developer community, making it easier to recruit and retain talented engineers—something that's no doubt critical in Facebook's ongoing battle with rivals like Google, Apple, and Microsoft to build the platforms that will engage web and mobile users for a generation.
"It turns out that large percentages of our engineers will have known about our open-source projects before they will have joined and they will say that it contributed positively to their decision to join the company," Pearce says. "It's a great window, I think, into the world of the sorts of problems that we solve, and of course we're hoping there are world-class engineers around the world who would relish those kinds of opportunities and when they see the problems we're solving will feel the urge to take a look."
Open, Open, Open
While there's no one standard way to measure the magnitude of a company's open-source contributions, Adobe VP and industry commentator Matt Asay has argued in a series of articles that Facebook has outstripped other open-source-friendly Internet giants like Google, Twitter, and Netflix and even software distributors like Red Hat to become the biggest industry contributor to open source.
In 2014 alone, Facebook launched 107 open-source projects, Pearce wrote in a year-end blog post.
The December 16 code release provides efficiency boosts and other enhancements for several features of Torch, a number-crunching and machine-learning library commonly used in industry and academic research for tasks like training computers to classify images and analyze written and spoken language. A blog post by Facebook researcher and engineer Soumith Chintala announcing the release went into far more technical detail than any press release, quickly delving into areas of mathematics that even many developers wouldn't have worked with outside of an upper-level college class.
"The sequence of operations involves taking an FFT of the input and kernel, multiplying them point-wise, and then taking an inverse Fourier transform," reads one section of Chintala's post.
Yet despite, or perhaps because of, all its technical detail, the post quickly rose to the top of Hacker News, the Reddit-style industry forum hosted by the renowned startup incubator Y Combinator, where readers discussed everything from the code's features to the benefits of open source for hiring programmers.
"I see no issue with open-sourcing tools to recruit developers," wrote one commenter. "Honestly, that's one of the best methods I can think of."
That echoes a 2011 blog post by GitHub cofounder Tom Preston-Werner, entitled "Open Source (Almost) Everything," which argued making software public helps attract talented programmers and keeps them satisfied as they build a public portfolio they can take pride in.
It also helps create better software, since developers take extra care in making their code well organized and reusable when they know it's going to be used outside their workplace, Preston-Werner wrote.
That's been the experience at Facebook too, where many projects are now designed from the start to be open source and promoted through blog posts, news releases, and the company's GitHub page, says Pearce.
"The projects that we open source, or the ones that we know we're going to open source in the future, we write better code," he says.
Making It Easy
Late last year, Facebook announced the release of osquery, a cross-platform, open-source tool that presents operating system data as standardized SQL database tables.
"One of the things I wanted to solve was making it really easy for everyone to do these kind of operating systems analytics regardless of their technical know-how," says Facebook engineer Mike Arpaia. "It's supposed to feel as if you have a database on your system that has this wealth of information, and you're just gaining access to it like a normal database."
The company built the tool to be open source and easily extensible, letting users inside and outside Facebook easily add their own code to generate additional tables of system data. Since its release, Facebook engineers have been active on osquery's GitHub page, merging contributions from third-party programmers and soliciting comments on future additions to the tool.
"The public reception of osquery is better than I could have ever imagined," Arpaia says. "We've gotten a lot of amazing open-source contributions so far, and a lot of them are very complex."
Developers generally make better decisions simply knowing that others outside the company will be looking to understand—and judge—how their code is laid out line by line and section by section, Pearce says. Building projects like osquery or Presto with a public release in mind leads to more flexible, modular code, since programmers know outsiders will be looking to swap in plug-ins based on their own needs.
"If [Presto] had been developed as an internal project, it probably would have had all sorts of implicit dependencies on parts of our infrastructure," he says. "I think that's a canonical example almost, where doing it as an open-source project means it's a much, much better piece of software."
In the case of osquery, engineers working on the tool quickly saw the number of built-in data tables first double in a Facebook internal hackathon, then double again after the tool was publicly released, says Arpaia.
"We didn't want to create a divide between public developers and internal developers," he says. "We go through it with the exact same rigor and make sure that all code that gets into master is of a similar level of quality."
A Thousand Extra Developers
In general, more than 1,000 outside developers contributed code to Facebook's open-source projects in 2014, accounting for 17% of the source code commits to the various projects, according to Pearce's year-end blog post.
With HHVM, Facebook's open-source virtual machine for running PHP code, the company's made a point of pursuing compatibility with major PHP frameworks and codebases deployed outside the company, hoping to spur innovation in the PHP language community at large, says Pearce.
"We just wouldn't be able to do that if we had fragmented support for PHP in the first place," he says.
A Nov. 19 blog post by an engineer at Box said the cloud storage company began migrating to HHVM from the slower standard PHP interpreter after seeing Facebook's efforts at improving its compatibility with existing code, something that had previously been a deterrent to making the switch.
"But just under a year ago, the HHVM team's growing focus on parity with the standard PHP runtime caught our attention," wrote Joseph Marrama, the Box engineer. "We decided to seriously re-evaluate HHVM, and, after a couple months spent working through GitHub issues and runtime differences, often with Facebook's HHVM team directly, it was clear that we were on to something big."
Facebook's also a founding member of the Todo Group, an industry consortium formed in 2014 to help companies learning to share their code also share their methods for running open-source efforts.
"We felt we had to reinvent that wheel, and it turns out other companies had to reinvent that wheel, too," says Pearce. "We have a calendar laid out of the topics that we're going to be addressing and over the coming weeks we're going to be sharing our thoughts on each of those and engaging the community more actively."