Breck's Blog - Data Posts


May 21, 2024

All tabular knowledge can be stored in a single long plain text file.

The only syntax characters needed are spaces and newlines.

This has many advantages over existing binary storage formats.

Using the method below, a very long scroll could be made containing all tabular scientific knowledge in a computable form.

Continue reading...

May 11, 2024 β€” That charts work at all is amazing.

Forty years.

One-billion heart beats.

Four-quadrillion cells.

Eight-hundred-eighty-octillion ATP molecules.

Compressed to two marks on a surface.

Continue reading...

Newton, Darwin, and a modern Scientist go to heaven.

Continue reading...

Bad models of the world can be dangerous.

We stood at the edge of the lake.

Everyone was in a wetsuit.

Except for me.

Wetsuits: hundreds of people.

Boardshorts: one person.

Continue reading...

Datasets are automated tests for world models

April 23, 2024 β€” I wrapped my fingers around the white ceramic mug in the cold air. I felt the warmth on my hands. The caramel colored surface released snakes of steam. I brought the cup to my lips and took a slow sip of the coffee bean flavored water inside.

Happiness is a hot cup of coffee in a ceramic mug on a cold day.

Continue reading...

Menu Instructions

Congrats on landing a job at Big O's Kitchen!

Our menu has 7 dishes.

Below are the instructions for making each dish.

Continue reading...

The girl lost the race.

"I want to be fast", she said.

"You are fast", said the man.

"No. I want to be the fastest."

Continue reading...

Pricking my finger then measuring the level of ketones in my blood.

April 16, 2024 β€” I pricked my finger and moved a disposable ketone measuring stick into the newly formed drop of blood. I was checking my "blood ketone" levels. If the result came back higher than 0.8 mmol/L, I would be in a state of "ketosis".

The meter showed "1.0 mmol/L". Success! I was still in ketosis.

Continue reading...

April 5, 2024 β€” Have you ever examined the correlation between your writing behavior and sleep?

I've written some things in my life that make me cringe. I might cringe because I see some past writing was naive, mistaken, locked-in, overconfident, unkind, insensitive, aggressive, or grandiose.

I now have a pretty big dataset to identify my secret trick to write more cringe: less sleep.

For this post I combined 2,500 nights of sleep data with 58 blog posts. A 7 year experiment to see how sleep affects my writing.

Interactive version.

Continue reading...

S = side length of box. P = pattern. t = time. V = voxel side length.

March 30, 2024 β€” Given a box with side S, over a certain timespan t, with minimum voxel resolution V, how many unique concepts C are needed to describe all the patterns (repeated phenomena) P that occur in the box?

Continue reading...

February 21, 2024 β€” Everyone wants Optimal Answers to their Questions. What is an Optimal Answer? An Optimal Answer is an Answer that uses all relevant Cells in a Knowledge Base. Once you have the relevant Cells there are reductions, transformations, and visualizations to do, but the difficulty in generating Optimal Answers is dominated by the challenge of assembling data into a Knowledge Base and making relevant Cells easily findable.

Activated Cells in a Knowledge Base.

Continue reading...

January 23, 2024 β€” I started a ketogenic diet as a treatment for bipolar disorder 97 days ago, on October 19th, 2023, after learning about it on YouTube from MetabolicMind and Bipolarcast. So far, it seems promising.

But I was perplexed: after 20 years of reading about Bipolar Disorder, and eight health care providers, how had I not heard of keto as a treatment option before? Had I missed it in all the materials I had read?

Continue reading...

January 4, 2024 β€” You can easily imagine inventions that humans have never built before. How does one filter which of these inventions are practical?

Continue reading...

September 1, 2022 β€” There's a trend where people are publishing real data first, and then insights. Here is my data from angel investing:

Sigh. I am sharing my data as a png. We need a beautiful plain text spreadsheet language.

Continue reading...

A Small Open Source Success Story

Adding 3 missing characters made code run 20x faster.

Map chart slowdown

June 9, 2022 β€” "Your maps are slow".

In the fall of 2020 users started reporting that our map charts were now slow. A lot of people used our maps, so this was a problem we wanted to fix.

Suddenly these charts were taking a long time to render.

k-means was the culprit

To color our maps an engineer on our team utilized a very effective technique called k-means clustering, which would identify optimal clusters and assign a color to each. But recently our charts were using record amounts of data and k-means was getting slow.

Using Chrome DevTools I was able to quickly determine the k-means function was causing the slowdown.

Continue reading...

October 15, 2021 β€” I constantly seek ways to improve my writing.

I want my writing to be meaningful, clear, memorable, and short.

And I want to write faster.

This takes practice and there aren't a lot of shortcuts.

But I did find one shortcut this year:

Set a thin column width in your editor

Mine is 36 characters (your ideal width may be different).

Beyond that my editor wraps lines.

This simple mechanic has perhaps doubled my writing speed and quality.

Continue reading...

May 6, 2021 β€” I am aware of two dialects for advice. I will call them FortuneCookie and Wisdom. Below are two examples of advice written in FortuneCookie.

πŸ₯  Reading is to the mind what exercise is to the body.
πŸ₯  Talking to users is the most important thing a startup can do.

Here are two similar pieces of advice written in Wisdom:

πŸ”¬ In my whole life, I have known no wise people (over a broad subject matter area) who didn't read all the time – none, zero. Charlie Munger
πŸ”¬ I don't know of a single case of a startup that felt they spent too much time talking to users. Jessica Livingston
Continue reading...

April 26, 2021 β€” I invented a new word: Logeracy[1]. I define it roughly as the ability to think in logarithms. It mirrors the word literacy.

Someone literate is fluent with reading and writing. Someone logerate is fluent with orders of magnitudes and the ubiquitous mathematical functions that dominate our universe.

Someone literate can take an idea and break it down into the correct symbols and words, someone logerate can take an idea and break it down into the correct classes and orders of magnitude.

Someone literate is fluent with terms like verb and noun and adjective. Someone logerate is fluent with terms like exponent and power law and base and factorial and black swan.

Continue reading...

March 2, 2020 β€” A paradigm change is coming to medical records. In this post I do some back-of-the-envelope math to explore the changes ahead, both qualitative and quantitative. I also attempt to answer the question no one is asking: in the future will someone's medical record stretch to the moon?

Continue reading...

How Old Are These Keys? Five Eras of Human Progress

My keyboard, if you removed the symbols from the typewriter and computer eras. Try it yourself.

February 25, 2020 β€” One of the questions I often come back to is this: how much of our collective wealth is inherited by our generation versus created by our generation?

I realized that the keys on the keyboard in front of me might make a good dataset to attack that problem. So I built a small interactive experiment to explore the history of the keys on my keyboard.

Continue reading...

January 29, 2020 β€” In this long post I'm going to do a stupid thing and see what happens. Specifically I'm going to create 6.5 million files in a single folder and try to use Git and Sublime and other tools with that folder. All to explore this new thing I'm working on.

TreeBase is a new system I am working on for long-term, strongly-typed collaborative knowledge bases. The design of TreeBase is dumb. It's just a folder with a bunch of files encoded with Tree Notation. A row in a normal SQL table in TreeBase is roughly equivalent to a file. The filenames serve as IDs. Instead of each using an optimized binary storage format it just uses plain text like UTF-8. Field names are stored alongside the values in every file. Instead of starting with a schema you can just start adding files and evolve your schema and types as you go.

Continue reading...

January 23, 2020 β€” People make biased claims all the time. A decent response used to be "citation needed". But we should demand more. Anytime someone makes a claim that seems biased, call them out with: Dataset needed.

Whether it's an academic paper, news article, blog post, tweet, comment or ad, linking to analyses is not enough. If someone stops at that, demand a link to a clean dataset supporting the author's position. If they can't deliver, they should retract.

Continue reading...

January 16, 2020 β€” I often rail against narratives. I think stories always oversimplify things, have hindsight bias, and often mislead. I spend a lot of time trying to invent tools for making data derived thinking as effortless as narrative thinking (so far, mostly in vain). And yet, as much as I rail on stories, I have to admit stories work.

I read an article that put it more succinctly:

Why storytelling? Simple: nothing else works.
Continue reading...

January 3, 2020 β€” Speling errors and errors grammar are nearly extinct in published content. Data errors, however, are prolific.

Continue reading...

The Attempt to Capture Truth

August 19, 2019 β€” Back in the 2000's Nassim Taleb's books set me on a new path in search of truth. One truth I became convinced of is that most stories are false due to oversimplification. I largely stopped writing over the years because I didn't want to contribute more false stories, and instead I've been searching for and building new forms of communication and ways of representing data that hopefully can get us closer to truth.

Continue reading...

July 18, 2019 β€” In 2013 I sent a brief email to 25 programmers whose programs I admired.

"Would you be willing to share the # of hours you have spent practicing programming? Back of the envelope numbers are fine!"

Some emails bounced back.

Some went unanswered.

But five coders wrote back.

This turned out to be a tiny study, but given the great code these folks have written, I think the results are interesting--and a testament to practice!

Name GitHubId Hours YearOfEstimate BornIn
Donald Knuth 56000 2013 1938
Rob Pike robpike 30000 2013 1956
Peter Norvig norvig 30000 2013 1956
Stephen Wolfram 50000 2013 1959
Lars Bak larsbak 30000 2013 1965
Continue reading...

January 13, 2018 β€” This is a story about how my FitBit logged a manic episode.

Continue reading...

June 23, 2017 β€” I just pushed a project I've been working on called Ohayo.

You can also view it on GitHub:

I wanted to try and make a fast, visual app for doing data science. I can't quite recommend it yet, but I think it might get there. If you are interested you can try it now.

Continue reading...

A Suggestion for a Simple Notation

September 24, 2013 β€” What if instead of talking about Big Data, we talked about 12 Data, 13 Data, 14 Data, 15 Data, et cetera? The # refers to the number of zeroes we are dealing with.

You can then easily differentiate problems. Some companies are dealing with 12 Data, some companies are dealing with 15 Data. No company is yet dealing with 19 Data. Big Data starts at 12 Data, and maybe over time you could say Big Data starts at 13 Data, et cetera.

Continue reading...

March 30, 2013 β€” Why does it take 10,000 hours to become a master of something, and not 1,000 hours or 100,000 hours?

The answer is simple. Once you've spent 10,000 hours practicing something, no one can crush you like a bug.

Continue reading...

December 23, 2012 β€” If you are poor, your money could be safer under the mattress than in the bank:

The Great Bank Robbery dwarfs all normal burglaries by almost 10x. In the Great Bank Robbery, the banks are slowly, silently, automatically taking from the poor.

One simple law could change this:

What if it were illegal for banks to automatically deduct money from someone's account?

If a bank wants to charge someone a fee, that's fine, just require they send that someone a bill first.

What would happen to the statistic above, if instead of silently and automatically taking money from people's accounts, banks had to work for it?

Continue reading...

August 11, 2010 β€” I've had some free time the past two weeks to work on a few random ideas I've had.

They all largely involve probability/statistics and have no practical or monetary purpose. If I was a painter and not a programmer you might call them "art projects".

Continue reading...

June 15, 2010 β€” I think it's interesting to ponder the value of information over it's lifetime.

Different types of data become outdated at different rates. A street map is probably mostly relevant 10 years later, while a 10 year old weather forecast is much less valuable.

Phone numbers probably last about 5 years nowadays. Email addresses could end up lasting decades. News is often largely irrelevant after a day. For a coupon site I worked on, the average life of a coupon seemed to be about 2 weeks.

If your data has a long half life, then you have time to build it up. Wikipedia articles are still valuable years later.

What information holds value the longest? What are the "twinkies" of the data world?

Books, it seems. We don't regularly read old weather forecasts, census rolls, or newspapers, but we definitely still read great books, from Aristotle to Shakespeare to Mill.

Facts and numbers have a high churn rate, but stories and knowledge last a lot longer.

Continue reading...

March 16, 2010 β€” I wrote a simple php program called phpcodestat that computes some simple statistics for any given directory.

Continue reading...

March 8, 2010 β€” If a post on HackerNews gets more points, it gets more visits.

But how much more? That's what Murkin wanted to know.

I've submitted over 10 articles from this site to HackerNews and I pulled the data from my top 5 posts (in terms of visits referred by HackerNews) from Google Analytics.

Here's how it looks if you plot visits by karma score:

Continue reading...

July 7, 2008 β€” After months of deliberation, I’ve decided to quit my day job and work on my blog full time.

I am joking.

But these bloggers were not:

Continue reading...

May 14, 2008 β€” The other day I wrote a post on How much Gas Americans use per day. The answer is 400 Million Gallons. A reader wanted to know how much gas the whole world consumes in a day. The answer is about 83 millon bbl’s. One bbl = 42 gallons, so the world consumes about 3.5 billion gallons of gas per day. That means the United States consumes 11% of the total gas consumed per day.

Continue reading...

May 8, 2008 β€” xirium posted a tarball of all the individual profile pages for HackerNews readers(minus lurkers and those who joined after 05/07/2008). I was curious what insights, if any, could be gleamed from analyzing the data. My findings are below. I could have figured out more interesting things if I also included posts in my data, but I was looking for something simple to work on. BTW, to get the data into a table I wrote a simple python script to parse the html files. The source code is at the bottom. Or you can download the resulting dataset as an excel file.

Continue reading...

View source