Dataset Needed

January 23, 2020 โ€” People make biased claims all the time. A decent response used to be "citation needed". But we should demand more. Anytime someone makes a claim that seems biased, call them out with: Dataset needed.

Whether it's an academic paper, news article, blog post, tweet, comment or ad, linking to analyses is not enough. If someone stops at that, demand a link to a clean dataset supporting the author's position. If they can't deliver, they should retract.

Of course, most sources don't currently publish their datasets. You cannot trust claims from any person or organization without an easily accessible dataset. In fact, it's probably safe to assume when someone shares a conclusion without the accompanying dataset that they are distorting reality for their own benefit.

Be a broken record: "Dataset needed. Dataset needed. Dataset needed."

Encourage authors to link to and/or publish their datasets. You can't say dataset needed enough. It is valuable, constructive feedback.

Authors: support your arguments with open data

Link to the dataset. If you want to include a conclusion, provide a deep link to the relevant query of the dataset. Do not repeat conclusions that don't have an accompanying dataset. If people can't verify what you say, don't say it.

Software teams: make it easy for users to share deep links to queries over public datasets

Many teams are creating tools that make it easy to deep link to queries over open datasets, such as Observable, Our World in Data, Google Big Query, Wolfram Data Repository, Tableau Public, IDL, Jupyter, Awesome Public Datasets, USAFacts, Google Dataset Search, and many more.

Students: Learn to build and publish datasets

I remember being a high school student and getting graded on our dataset notebooks we made in the lab. Writing clean data should be widely taught in school, and there's an army of potential workers who could help us create more public, deep-linkable datasets.

Notes

Thanks to DL for helping me refine my thinking from this earlier post.

View source