Below is an email I sent to someone close to me about how I discovered Tree Languages. They knew that in addition to working full time on a personal data visualization app over the past year, I also had some side research project. But they thought the data viz app was my top work priority. Which was true, until recently. So they asked “how are TN and the visualization app related?” This is my (slightly edited) response.


As to how the data viz app and the paper are related, it’s a long story.

I remember the very moment when I first “discovered” Tree Notation. I was walking from [our house in San Francisco] to our old NudgePad office [which was on 12th and Folsom]. [Nudgepad was] building a visual html authoring tool, and I was saving the files people would create using a language called HAML. But we were low on money and time, so I needed to cut corners wherever I could, and so I had been dropping features from HAML to save development costs. I kept dropping features from the language and realizing everything still could work, which surprised me.

By that day, I had just 3 syntax characters left “\n”, “ “, and “:”. As I was walking to work I was thinking to myself, “wow, I’ve removed everything from HAML and it still works…wait! no! I could still drop the “:”!”. And it struck me immediately that that was an interesting idea. I remember immediately going and coding it up, and to my delight, things still worked. At that point I thought it was pretty profound, that a simple structured language could be so useful. I had only the vaguest of hints though that this hadn’t been known yet. So for a couple years I refined it while simultaneously searching for whether something like it already existed. Writing this it just dawned on me that at one point years ago I tried to get some advisors for my “research project” but had no luck. For example:

(Note: I’m including the screenshot above as a way of saying thanks to Doug. He was one of a few people who replied at all to my cold emails, and he wished me good luck! That was awesome, even if I didn’t snag him as an advisor.)

At first I called the notation “Tree”, then I called it “Note”, then I called it “Space”, before eventually settling last year on “Tree Notation”.

So anyway, I pitched and pushed the “research project” for many years with little success. People just didn’t care. This was fully my fault, as I had been attacking the wrong problem, but I didn’t know that at the time. I just had a hunch that it was useful (it was too simple and beautiful to not be!), just hadn’t proved for what yet.

At many times I thought of just dropping the project, but in many ways it felt like “my baby”, and that I needed to help it into the world. Sorry if that sounds weird, I don’t know how else to put it. But it was stuck.

And meanwhile I wanted to build the data viz program, because you know me and how I love data and “quantified self” stuff and wanted a tool that made it faster for me to be more data driven. So I left Microsoft last year to build the personal data viz app, and I also expected to continue to work part time on the research on TN.

To solve the data viz problems in a visual programming way, I needed to use TN to store the documents that people would create. I knew this already since I’ve been in the space for so many years and knew that without TN, tools and humans could not interoperate cleanly on the same code. But at first I was just using TN as a replacement for JSON, in that it was just a pretty simple document encoding.

I believe it was over Christmas break, maybe a little before, when I started to experiment with adding more power to the data language format, realizing that really what would work well for the data viz tool would be a “Dataflow language”. And so I started moving “BoardScript” (now called “Flow”), the TN language I was using to encode the data viz boards, into the direction of a more powerful dataflow programming language. IIRC, I sent you a text at one moment like “this is going to be amazing!” Or something hyperbolic along those lines. That was when it dawned on me that I could still use TN to get the visual editing but now my TN language would be a lot more powerful than just a static document language.

Then, about 2 months ago(?), I did a big refactor of my app architecture to more closely model Facebook’s ReactJS. That worked great and made development much easier. But there is a big glaring problem with FB’s React which everyone has tried so far unsuccessfully to fix–how to handle state. I was thinking to myself that maybe I could use TN for that. But I didn’t want to risk it, because I was so late on shipping already and didn’t want to do anything that might delay launch. But then one day, after mounting frustration trying all the other ways Facebook recommends to handle state, I decided to give it a go with TN. I made my base component extend TN and then had a nice, clean, plaintext tree to store and manage state. And it worked brilliantly.

Then, a couple weeks ago, remember how I was struggling getting test coverage up? I was at like 11% or something even worse. I had been doing the bare minimum in unit testing as I was building the data viz app and now with a complex app I was paying for it big time. And I was realizing it was going to take me forever and thousands mores lines of code just for the tests. Which tests themselves would also have to be maintained, etc. It was really scaring me. I knew I could use a Lisp to greatly reduce the code size but I didn’t want to go down that path because I’ve always believed long term Lisp is not scalable on big projects. Then as I’m staring at my test code, I’m thinking, “you know, TN might work for this”. And took a stab at it. And then, within a few days, I had another new “Tree Language” for tests and it worked brilliantly and I was able to rocket to 60-70% test coverage and ended up with fewer lines of code than I had for the 11% coverage!

And really it was just the past two or three weeks that all of these things clicked for me. I had a TN for Dataflow. I had a TN for CSS. I had a TN for my React app. I had a TN for my unit tests. I had a TN for my concurrent async integration tests. And there was the big “aha” moment where I realized that TN wasn’t just a new class of document languages, it was a new class of powerful programming languages. And I realized that 99% of the value in TN lies in the program language use cases, not the document language use cases.

I had been focusing on the wrong application of TN. I was focusing on document languages like JSON and XML because I figured I should get a “foothold” of success in document languages before moving on to programming languages. I do think TN offers about a 10% improvement over JSON or XML. But I have struggled for years to make that a more meaningful improvement, with little luck. But with programming languages, I see that TN/Tree Language might offer a 1,000+% fold improvement.

So anyway, that was a really long story, but explains how it all came together. And how the data viz app really pushed me to take TN to a new level, and really finally figure out what TN is good for. Does that make sense?

Anyway, now I believe the Tree Language discovery will dwarf the impact of the data viz app. That makes the data viz stuff much more fun, because now I look at my job as just encouraging the discovery of more Tree Languages and Tree Language tools, and the data viz app as just one application of many. Of course, that’s still my favorite application, but it’s nice to have a side project again.

6/22/2017

(Updated on 9/27/2017 to change ETNs to Tree Languages)