Hi! I’m Harris Lapiroff. I work for Freedom of the Press Foundation, a non-profit that defends journalists and whistleblowers through technology, advocacy, and digital security. I’ve been working in house at FPF for three years, worked for them as a consultant for some three years before that, so I’ve been around for the entire lifespan so far of the project I’m talking to you about today.
Quick overview of what I’m going to talk about today
Going to tell you about the U.S. Press Freedom Tracker and its history and purpose
Going to introduce you to our API
And I’m going to give you a case study of how we used our data to draw insights about press freedom violations during Black Lives Matter protests in 2020
In 2017, in partnership with a couple dozen other press freedom organizations, we launched the U.S. Press Freedom Tracker. Its purpose is to comprehensively and systematically document aggressions against press freedom in the United States.
While stories of journalists being attacked and arrested have of course been covered in the past, there wasn’t a central repository of them that we could use to answer questions like: “How many journalists were arrested in 2020? Is it more or fewer than in 2019?” or “How many journalists have been subpoenaed this year?”
And I think in general, we’ve been quite successful at that goal. Our reporting is commonly cited, among other places, in news stories and amicus briefs in court cases. One recent example from this year: a journalist, Andrea Sahouri of the Des Moines Register, went to trial after being arrested while covering a protest. A lot of reporters turned to us and our data to find out just how common it was for journalists to actually face a criminal trial (the answer: quite uncommon—cases are usually dropped before then).
The tracker organizes incidents into 11 categories. Some of them are easier to be comprehensive than others: for example it’s pretty obvious what qualifies as a leak prosecution or a journalist being arrested, but chilling statements is a both muddier and more expansive category and we couldn’t possibly be comprehensive. And as you can see we also have this catchall “other” category for incidents that we think deserve coverage, but don’t fit neatly anywhere else.
Each incident is thoroughly reported out by our staff in a narrative form.
But we also think of the tracker as a database and for every incident we cover, we record a bunch of structured data about it. We knew early on that we wanted to be rigorous in our reporting and provide people a reliable dataset to identify trends and put particular incidents in context.
So, from the beginning, we’ve had an API, through which people could use the information on our site.
As you likely know, an API is a way to access data from a system—in our case the U.S. Press Freedom Tracker. Our API in particular doubles as either a way to get an export of the complete contents of the website or to execute a query for a more specific subset of our data and can be easily accessed through a web browser or by any automated script.
So, after four years, we actually just launched a new API quite recently—basically yesterday, in fact. So if there’s any sloppiness in my presentation today, know that it’s because I was waiting to see whether or not we would get the new API across the finish line before delivering it.
The API can currently be accessed at the URL up top. You can even visit this in a web browser right now if you want! Our website is powered by Django and Django Rest Framework, which provides a nice interface for perusing the API in a web browser.
If you do visit, you’ll notice that’s really a holding page for listing different endpoints. Currently we only have an endpoint that supports fetching data about incidents, but we do have plans to add a categories endpoint as well for fetching information about different categories.
If you just want to download all the incidents right now, use https://pressfreedomtracker.us/api/edge/incidents/
The default response is JSON. If you want them as a CSV (bonus, smaller file size) you want https://pressfreedomtracker.us/api/edge/incidents/?format=csv
Those requests are all appropriate if you just want to download the data and do all the processing offline. We also provide functionality for developing projects around our API that routinely query for new data, filtered as appropriate for your project. This is documented on our website at https://pressfreedomtracker.us/data/ — I’m going to avoid getting too technical today by going into the details of those queries, but I’m showing a couple examples here and you can feel free to read the documentation at that first URL or ask me more questions later, either during Q&A or privately afterward.
As I’m sure everyone remembers, in May of last year a black man George Floyd was murdered on video by Minneapolis police officer Derek Chauvin, setting off a month of protests across the country.
These protests were clearly a reckoning on race for the U.S., but they were also a flashpoint for press freedom. In 2020 we documented a total of 517 incidents specifically at Black Lives Matter protests.
For scale the previous three years respectively each saw fewer than 160 incidents. Our staff worked around the clock to meticulously report and document every incident. Again, each of these incidents is published on our website and you can browse the website for those individual reports, but we can also put these incidents into context by charting the aggregate data.
So here, digging into just 2020, you can see how incidents spike in May when protests begin and slowly taper off, but not return to baseline levels, as protests continue over the course of the rest of the year.
This was a timeline of incidents I created for our website and it actually uses our API to provide quite a bit of interactivity. You can hover over each of the incidents to get details about the specific incident and you can use the highlight menu at the top to call out incidents by city or even by who the aggressor was in each situation. I encourage you to visit the chart and interact with it yourself, but I just want to show off one particular view of it.
If you ask the dropdown to highlight incidents where the assailant was law enforcement, you can see that the majority of physical attacks on journalists during the protests were perpetrated by law enforcement officers.
Here’s a map of incidents at protests across the country that was actually put together using our data by a data journalist at our partner organization the Committee to Protect Journalists.
Here’s one other chart I particularly like where we took the top 10 cities by number of incidents and charted them cumulatively over time. You can see how the protests start in Minneapolis in late May and then quickly spread to other cities, but I think Portland is a particularly interesting to follow here, because you can how a pretty steady stream of aggressions against journalists covering the protests quite immediately followed the deployment of Homeland Security agents to Portland.
And finally, I put together this heat map of the four years of Trump’s presidency, starting on the day of his inauguration and ending on the day of Biden’s inauguration. This one isn’t specifically related to BLM protests, but you can definitely see a couple of them show up on the heat map, particularly, again, the protests starting in May of 2020. You can also see January 6, when protesters stormed the U.S. capitol, shows up quite dark on this heat map as a day of quite a few aggressions against journalists.
All the charts I showed here are available online—I made them using Observable Notebook. If you’re a code-oriented person, I highly recommend checking it out, if you haven’t already. It’s a really remarkable system.
The U.S. Press Freedom Tracker is at pressfreedomtracker.us. If you want to use our data or if you’re a journalist who has an incident to report, please, get in touch. We’re very responsive and happy to talk. If you come up with anything using our data or API or want some guidance in doing so, please don’t hesitate to talk to us. We want this information to be widely available, studied, and reported, and we’re more than happy to talk through specific quirks in how it’s recorded, offer advice for analyzing it, and just generally see what people do with it.
And finally, feel free to find me personally on Twitter, my name is Harris Lapiroff and so is my Twitter handle.
And now I’m happy to answer questions people have: about our technical setup, our reporting, our data, any of my visualizations, whatever.