Open Source Copyright Infringement: Are DMCA Takedowns on GitHub Increasing?

While browsing /r/linux I saw this post: Rate of DMCA takedown demands is increasing on GitHub.

That sounds pretty scary, for a couple of reasons. On one hand, it could mean that companies are really dropping the hammer and coming after folks more often and potentially in error. On the other, it could mean the open source community is becoming more lax or lazy, using others’ code without respecting licenses or copyright.

But could there be a third alternative? Is the rate of DMCA takedowns really increasing in a meaningful way?

Fortunately GitHub is open about the DMCA notices they receive. Since January 27, 2011 GitHub has posted each request received and each counter-notice issued. All 317 of them.

Here they are by month.

Are DMCAs issued to GitHub on the rise?

The graph above shows the number of DMCA takedown requests issued each month, according to GitHub’s own DMCA repo. Another redditor made a similar graph (though I suspect from the R code posted that they didn’t filter out counter-notices).

According to this data there is clearly an upward trend in DMCA takedowns, with a peak of 41 in May 2014.

Who is Issuing the Takedowns?

Most companies or individuals who issue DMCA requests to GitHub do so only once. Such requests account for 68% of all DMCAs received by GitHub. The remaining 32% of takedowns come from just 30 sources. According to the original theory that the requests are on the rise, looking at the number of takedowns per originating party could help us identify who is cracking down on copyright the hardest.

Alternatively, it could reveal who developers like to infringe upon the most.

Who is issuing the most DMCA takedowns to GitHub?

To my surprise, Codility leads the pack - and by quite a bit. I expected Sony, Amazon, or others in the music industry to be the frontrunners.

The number of takedowns from Codility becomes less surprising in the context of what Codility is: a service designed to help corporate recruiters test candidates for development positions. By having developers solve programming challenges, companies can quickly identify who has the desired skills and who does not. That saves hiring managers a lot time and money.

It also gives shady candidates an incentive to share questions, code snippets, and solutions to these challenges.

A quick perusal through Codility’s DMCA requests shows that’s precisely the case.

…but are DMCA Takedowns Really on the Rise?

It’s hard to answer the question about what is driving DMCA takedowns on GitHub. Are companies being more aggressive about their copyrighted code? Or are developers being more fast and loose with licenses or outright stealing? One thing is clear: the total number of DMCA requests to GitHub are higher than ever.

What do More Frequent DMCA Requests Really Mean?

Just as DMCA requests are going up, the total number of repositories being created on GitHub are also increasing. Thus looking at DMCA takedowns is meaningless without considering the increase in projects over the same time.

Since its launch in 2008, GitHub has seen tremendous growth. Starting with zero users and no repositories, GitHub now boasts over 14 million repositories. That number represents more than a doubling in repositories over the previous year. In the words of Brian Doll at GitHub:

The first million repositories were created in just under 4 years; 3 years, 8 months and 15 days to be exact. This last million took just 48 days. In fact, over 5.5M repositories — more than half of the repositories on the site — were created this year alone.

That’s a lot of new projects. With so many projects being created, it stands to reason that more DMCA requests will follow. More code more potential for infringement.

To truly reflect a more aggressive campaign against infringement - or developers playing fast and loose with copyright, the rate of DMCA takedowns would have to grow more quickly than the rate of projects being created. So let’s take a look at the rate of DMCA requests per 1,000,000 repositories over the last four years.

Well then...

Suddenly a different picture emerges. And one that should be a bit more uplifting.

The frequency of DMCA requests to GitHub may be on the rise, but so far it is not out-pacing the rate of new projects. In fact, the proportion of repositories being issued DMCA notices is at an all-time low. 2014 is not over yet and a lot could happen in the coming months. But it would take an unprecedented deluge of DMCAs to upset the current trend.

In addition to encouraging readers to put away the pitchforks, I hope this also prompts a closer look at DMCA requests and the growth of GitHub. With the GitHub API it becomes possible to do tons of really interesting analyses of code (others already have done some fantastic work with it).

Whatever you decide to do with the API, just don’t violate copyright - especially not under the watchful eye of Codility. And don’t rely on raw counts without taking a look at the bigger picture.