Here’s a fun story that involves Tyler Cowen. He’s a professor of economics — which is not too interesting. What’s interesting is that he’s also the main force behind one of the smartest, most interesting websites on the internet, Marginal Revolution. Back when I was scraping the internet every night to compile a corpus of blogs and news that I felt might have predictive power Marginal Revolution stood out (from thousands) as a thoughtful blog that I actually found interesting to read. The fact that it has persisted for well over a decade (where I’ve been a faithful reader) is quite unusual. That uniqueness is compounded by the prolific content. Normal influential bloggers might put out something interesting a couple of times a week if they’re very active, but Cowen publishes half a dozen things pretty much every day. He also is a prolific writer for Bloomberg — for example here’s one today. To summarize — Tyler Cowan is kind of a titan among public intellectuals. I’m a fan.

I was reading his stuff like I do and I saw him link to a story about some C19 vaccine snafu involving the FDA and, of all things, data formats. Well, I don’t usually know anything Tyler Cowen doesn’t already know but in this case, I’m kind of a minor expert on this exact weird topic. I sent Professor Cowan an email pointing out that I’ve had some experience with messes from the FDA and I’m not surprised to hear about a problem as banal as this. To demonstrate my ever so slightly elevated experience with this topic, I sent him a link to my old FAERSFix Github repo. Obviously he (and all other humans) have no idea about this obscure project so it was intended to just be another data point for him on the topic.

Yet the next day I’m making my normal rounds of the internet and I find he has posted this.

And it features my README from that project and a link to its Github repo.

This GitHub repository is a back up of my FAERSFix scripts.

The FDA Adverse Event Reporting System is a horrifically dysfunctional quagmire of shockingly bad data. The data is not just bad for severe epistemological reasons, it is also poorly organized and riddled with flagrant absurd errors.

These scripts smooth over the very messy process of acquiring and basic debugging of the data. At the end of the process a user can arrive at a local repository of the FAERS data that is sane enough to begin to think about some kind of sensible analysis. To understand the disastrous state of the original source data, see the source code of the scripts which is designed to be a readable self-documenting manual demonstrating how to correct this mess.

Since the FDA’s gremlins never rest, these scripts will become obsolete. If you would like to contribute updates or fixes, feel free to send me a patch or a pull request. Good luck!

faersfix.jpg

That’s kind of cool but a bit more public than I’m used to. However that’s actually why I uncharacteristically put it on Github — I was hoping to make it easy for someone else to use. I actually think the topic is pretty important.

Let’s back up and answer the question, what exactly is this data? And why should we care?

Let’s back up even more. You’ve heard about "clinical trials". They test some drugs and if they seem ok, they’re "approved", right? But when those drugs get deployed, they get prescribed to orders of magnitude more people than were in the trials. That shouldn’t seem surprising. But what is shocking to me is that after literally billions of people eat these magic beans, the FDA does not really have any intellectual curiosity about what actually happened in real usage situations and scales. Were the drugs seriously flawed? Well, if the trials were too, then we’ll never know!

Checking up on how drugs really are working out is a hobby called pharmacovigilance. The fact that it is kind of a newish idea is horrifying to me.

And this brings us back to FAERS. When you, your doctor, your lawyer, or, well, anyone really, feels like submitting a report about an "adverse event" related to some therapeutic substance, you can submit it to FAERS. Is it required? Something that doctors are obliged to do? Heavens no! So let me repeat — anyone can submit anything any time. Not off to a great start.

And that’s where I come in. Let’s say you want to do some analysis on this ugly data. (Incidence Rate Ratios are probably your best hope by the way.) It turns out that it’s not just of questionable provenance. The actual format and technical arrangement of the data is also terrible. It is so terrible that I had to write this software which simply acquires the data from its hiding places and patches up the most egregious errors. It’s been a long time since I played with this but with my software I could start with an internet connection and wind up with a reasonably formatted collection of questionable data. Doesn’t sound like much but it beats unreasonably formatted!

One final fun thing to point out. Because Professor Cowen’s post got quite a lot of comments, I actually got a lot more feedback on this project than ever. One commenter misunderstood some important points, but definitely caught an error I found very interesting.

A small thing, but this criticism of the FDA software would carry a little more credibility if the author used the correct name - it is the FDA Adverse EVENT Reporting System.

They are referring to the fact that I incorrectly referred to the source data as the "FDA Adverse Effect Reporting System". I never noticed that was incorrect. (And I’m not 100% sure it was 4 years ago, but anyway…) It is definitely incorrect and should be the "FDA Adverse Event Reporting System". So I stand corrected! Stupid me. But I wondered why would I think that? I didn’t have to dig far. Most of the source data URLs (77 of them when I worked on it) looked like this.

www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Surveillance/AdverseDrugEffects/UCM364757.zip

It’s been a few years since I tracked this data and the bigger irony is that looking at the FDA’s website today, I see they’ve cleaned some things up quite a bit. They at least put all the data in one coherent place which is good. Who knows, maybe they used my code and fixed the other problems. I hope so. I’ve at least done everything I can do to help them.