Is This Post True?

There’s a lot of discussion about what constitutes fake news, what impact it has, and whether blocking it is the bigger threat.

I’d like to instead talk about the perception of truth. How do we change the norms of publishing so that a mistake is detected rather than amplified? How do we broaden the context that an article lives within and dynamically update it as time passes and others expand on the original article? In what context are writers presenting their reporting and opinions?

I started reading Neil Postman’s “Amusing Ourselves to Death” the other day. It was written in 1985, but feels very relevant today. I’m not that far into it, but the second chapter delves into the interplay between truth, and how the medium we are communicating in shapes that truth.

As a culture moves from orality to writing to printing to televising, its ideas of truth move with it. … Truth, like time itself, is a product of a conversation man has with himself about and through the techniques of communication he has invented.

that there is a content called “the news of the day”— was entirely created by the telegraph (and since amplified by newer media)… The news of the day is a figment of our technological imagination. … Cultures without speed-of-light media -— let us say, cultures in which smoke signals are the most efficient space-conquering tool available -— do not have news of the day.

It is not really a deep insight to say that the news of the minute, the “trending” news is something we have created with the systems we have built. Lots has (somewhat rightly) focused on Facebook and Google, but the Open Web is much larger than that.

We’ve lowered the barrier to publishing. We’ve changed the medium through which we express truth, but we haven’t really changed the norms or means by which we enable readers to judge truth.

Let’s compare how truth is perceived in some different mediums:
1. Newspaper journalistic standards in the 1980s (for instance) relied on “balance” and “unbiased” language and had separate sections for opinion. All of this got published by some “reputable” source. Some publisher with a long history that was known to the reader. And typically there was not really a lot of competition within any particular geographic region (though this hasn’t always been true) so it was pretty easy to know the range of writers and publishers.
2. Scientific journals rest on citations and peer review. Ostensibly the data/methods are all available so someone could reproduce the experiments. But doing so is often not easy. The perceived truth is determined both by the reputation of the journal (and by proxy, who reviewed it) as well as explicitly referencing other papers that may disagree. A paper that ignores existing literature is unlikely to get past reviewers.
3. For an article on the Web we borrow a lot of our norms from (1) and (2). We have also added a social validation of truth by displaying a count of likes, the number of comments, or the number of people who have shared an article. Rarely is anything shown about the people providing the social validation other than a simple count. Sometimes it is also where the content ranks on in a search, or perhaps what the post is linking to. But the text of a link and the entire content is entirely within the writers control.

Obviously, for all of these cases I am ignoring lots of background on all the work that people do in research, investigation, validation, and writing. The point here is not about what went into publishing it, but rather what does a new reader see? Why does a reader trust it? Of course the reader can judge the written evidence for themselves, but we know of a huge list of cognitive biases that the reader must contend with. Additionally, it takes a lot of time to research the truth of an article.

Both (1) and (2) do nothing to handle the case where the author has simply made a mistake. They have mechanisms for referring to articles published before the current one, but the web is capable of dynamically updating to refer to things published afterwards. An articles does not need to be static.

How do we take the best of (1) and (2) and adapt them into a world where the reader can better judge truth in (3)?

I don’t really have an answer. I’m not sure these are even the right questions, but let’s try an idea, right here and now since this post is currently asking you to determine its validity. Don’t just judge just my words. Judge my words against the words of others in the world. You’ve read this far, let’s add some related content to this article and see how it affects your personal search for the truth. Then let’s reconvene below and discuss some more…

 

Onward…

How has your perspective of this post changed? Did you open any of the links? Did you get lost in them? Were you overwhelmed by them? Do you feel that you can more accurately judge truth? Do you find my ideas more valid or less valid? Getting back to my click-bait headline: Is this post in accordance with fact or reality (true)?

I haven’t actually read all of those posts. I looked at a few. The Stanford Web Credibility Project from 2002 was very interesting and relevant. Its top recommendation is to:

1. Make it easy to verify the accuracy of the information on your site.

What if in order for a writer’s opinion or reporting of news to be considered a part of humanity’s search for truth that writer is expected to publish it alongside others’ content? What if publishers are expected to give others traffic in order for your words to have any weight? Why should I care what you have to say if you’re afraid to algorithmically link me to other people who are providing other opinions/insights? What if readers learn to instantly dismiss any article that is not willing to automatically link to others?

Displaying related content is a fundamental part of the web now, but so far we have mostly only used it to keep users on our own sites, or to make money with advertisements. Maybe there is another use case? Yes, the user could go to a search engine, but maybe we can improve the truth seeker’s user experience beyond that.

This related content would not need to be static. As posts link to yours, they may get weighted more heavily and show up on your post. So your post does not only link backward in time, it is a living document providing a link to how others have built on your work.

Some systems already try to do this. WordPress has pingbacks where a link from some other site generates a comment on the site it is linking to. It is an attempt to keep an old post connected to the conversation, but there is no weighting of it relative to others. It doesn’t really scale for a very popular post. And a post that is two hops back in the chain is not necessarily considered.

Of course a related content system still has potential problems of bias due to who controls the algorithms. Open sourcing the algorithms would help, as would having a standard mechanism where multiple providers can provide the service in a compatible way. Building trust here would be hard. Getting publishers to trust content from competing publishers to be inserted into their page would be especially difficult.

But maybe by changing the norms through which we judge truth we can get back to seeking truth together. Or maybe there are other ideas for how we should be presenting our articles to help humanity find truth.

 

No Handlebars

Was listening to the Flobots earlier on a run. Feels very appropriate to be thinking about the impact of technology on society and how to do better.

Uniform priors now seem much less appropriate to me than they used to. Time for this blog to be less neutral also. 

Great Links to Lucene/Solr Revolution 2015

A great list of links to slides from Mani Siva’s blog. A lot of things I’m currently thinking about: overlaying graphs on Elasticsearch to do content re-ranking, improving search relevancy, what is “fairness” in ranking and how you display content.

 

One of the annual conferences that I always look forward to is Lucene/Solr Revolution. The reason is not only because of highly technical nature of the conference, but also you can get a glimpse of how future of search is evolving in the open and how Solr is being pushed to its limits to handle […]

via Lucene/Solr Revolution 2015 — Mani Siva’s blog

The Walsh Standard v Automattic Creed

I’m reading Bill Walsh’s book The Score Takes Care of Itself on his methodology for getting the San Francisco 49ers to perform at a high level in the 1980s (and win 3 Super Bowls in the process), and I found it interesting how closely his Standard of Performance matches up against the Automattic Creed. I thought I would compare the two: Walsh’s clause in black, relavant Automattic line(s) in red, and some notes from me.

Exhibit a ferocious and intelligently applied work ethic directed at continual improvement;

I will never stop learning.

…the only way to get there is by putting one foot in front of another every day.

In both cases the first line is about learning.

demonstrate respect for each person in the organization and the work he or she does;

I will never pass up an opportunity to help out a colleague, and I’ll remember the days before I knew everything.

Not really a perfect match, but to help someone with humility implies respecting them and who they are.

be deeply committed to learning and teaching, which means increasing my own expertise;

I will never stop learning.

I will never pass up an opportunity to help out a colleague, and I’ll remember the days before I knew everything.

Learning and teaching go hand in hand.

be fair;

I will build our business sustainably through passionate and loyal customers.

I think its a stretch to say that the Creed mentions fairness so openly. But this line captures the fairness we strive to apply to our customers.

demonstrate character;

Really the whole Automattic Creed is about character. Being written in the first person means it is defining the character that we are all agreeing to try and match.

honor the direct connection between details and improvement, and relentlessly seek the latter;

I will never stop learning.

I am in a marathon, not a sprint, and no matter how far away the goal is, the only way to get there is by putting one foot in front of another every day

Systematic learning. I like Walsh’s focus on the details mattering though.

show self-control, especially where it counts most— under pressure;

I am more motivated by impact than money, and I know that Open Source is one of the most powerful ideas of our generation.

Choosing what to work on and how to work on it is all about self control.

demonstrate and prize loyalty;

We don’t really go for loyalty oaths. We have a great retention rate though and really quite a lot of loyalty and friendship. Getting together with Automatticians feels like getting together with family.

use positive language and have a positive attitude;

I’ll remember the days before I knew everything

This is really the closest thing we have to mentioning a positive attitude in the Creed.

take pride in my effort as an entity separate from the result of that effort;

Open Source is one of the most powerful ideas of our generation

I guess that kinda works… Again not strong overlap. “Pride” is actually the first line in the Automattic Designer’s Creed.

be willing to go the extra distance for the organization;

I am in a marathon, not a sprint

I think there is a pretty important difference in philosophy here. Walsh is pretty strongly into personal sacrifice in what I would call an unsustainable way — “If you’re up at 3 A.M. every night talking into a tape recorder and writing notes on scraps of paper, have a knot in your stomach and a rash on your skin, are losing sleep and losing touch with your wife and kids, have no appetite or sense of humor, and feel that everything might turn out wrong, then you’re probably doing the job.”

To me you are not doing the job at all. You’re going to write terrible code, poorly think it through, and have a really hard time sustaining this pace. Everyone has crunchtimes, but this is not a sustainable way to live. Nor is it a sustainable way to build products. I have similar problems with the Agile methodology and its focus on “sprints”.

Ironically though, I am writing this post when I should be asleep.

deal appropriately with victory and defeat, adulation and humiliation (don’t get crazy with victory nor dysfunctional with loss);

Given time, there is no problem that’s insurmountable.

Keep working at it, don’t give up. This can be really hard when you spend months fighting to fix one scaling problem after another and it is impossible to know where the end is.

promote internal communication that is both open and substantive (especially under stress);

I will communicate as much as possible, because it’s the oxygen of a distributed company

Open communication is the type of communication that really matters.

seek poise in myself and those I lead;

No real direct comparison. To me poise is keeping your cool when you realize that some code you launched is causing a performance problem about to bring down WordPress.com and how do you diligently and swiftly solve the problem.

put the team’s welfare and priorities ahead of my own;

I don’t really want a similar line in the Creed. Yes, I sacrifice some of the things I would love to be working on in order to pursue Automattic’s goals, but my welfare and priorities are pretty well aligned with the company’s.

 

maintain an ongoing level of concentration and focus that is abnormally high;

I am in a marathon, not a sprint, and no matter how far away the goal is, the only way to get there is by putting one foot in front of another every day

To me the level of concentration needed is to be able to focus on what is most important, say ‘no’ to the sorta important things, and get up the next day and make that correct decision yet again. I don’t make the correct decision every day, but hopefully I will quickly recognize when I make the wrong one.

and make sacrifice and commitment the organization’s trademark.

Given time, there is no problem that’s insurmountable

Again Walsh focuses a lot on sacrifice. I think that’s the wrong thing to focus on. There is certainly some involved in any endeavor, but the way he discusses it in his Standard of Performance, and elsewhere in the book doesn’t feel very sustainable.


 

There’s some interesting differences between the two, but the focus on teaching, openness, and being systematic in making progress stand out pretty strongly. For reference, here is the entire Automattic Creed:

I will never stop learning. I won’t just work on things that are assigned to me. I know there’s no such thing as a status quo. I will build our business sustainably through passionate and loyal customers. I will never pass up an opportunity to help out a colleague, and I’ll remember the days before I knew everything. I am more motivated by impact than money, and I know that Open Source is one of the most powerful ideas of our generation. I will communicate as much as possible, because it’s the oxygen of a distributed company. I am in a marathon, not a sprint, and no matter how far away the goal is, the only way to get there is by putting one foot in front of another every day. Given time, there is no problem that’s insurmountable.

 

Learning About Modern Neural Networks

I’ve been meaning to learn about modern neural networks and “deep learning” for a while now. Lots to read out there, but I finally found this great survey paper by Lipton and Berkowitz: A Critical Review of Recurrent Neural Networks for Sequence Learning[pdf]. It may be a little hard to follow if you haven’t previously learned basic feed forward neural networks and backpropagation, but for me it had just the right combination of both describing the math while also summarizing how the state of the art is applicable to various tasks (machine translation for instance).

Screen Shot 2015-12-28 at 7.41.13 AM

A two cell Long-Short-Term-Memory network unrolled across two time steps. Image from Lipton and Berkowitz.

I’ve been pretty fascinated with RNNs since reading “The Unreasonable Effectiveness of RNNs” earlier this year. A character based model that can be good enough to generate psuedo code where the parenthesis are correctly closed was really impressive. It was a very inspiring read, but still left me unable to really grok what is really different about the state of the art NNs. I finally feel like I’m starting to understand and I’ve gotten a few of the Tensorflow examples running and started to play with modifying them.

Deep learning seemed to jump onto the scene just after I finished my NLP Masters degree about 5 years ago. I hadn’t really found the time to fully understand it since then, but it feels like I’ve avoided really learning it for too long now. Given the huge investments Google, Facebook, and others are putting into building large scalable software systems or customizing hardware for processing NNs at scale, it no longer just seems like hype with clever naming.

If you’re interested in more reading, Jiwon Kim has a great list.

 

Six Use Cases of Elasticsearch in WordPress

I spoke at WordCamp US two weeks ago about six different use cases I’ve seen for Elasticsearch within the WordPress community.

I also mentioned two projects for WordPress.org that are planned for 2016 and involve Elasticsearch if anyone is looking for opportunities to learn to use Elasticsearch and contribute to WordPress:

The video is here:

 

And here are my slides:

2014 In Review

Another year at Automattic, another 3 million emails sent to users to point them at their annual reports. This is my fourth year helping to send the annual reports, and as usual it makes for an exhausting December for everyone involved. But its pretty rewarding seeing how much people like them on Twitter. This year I’m particularly excited to see tweets from non-English users because we’ve expanded our translation of the reports from 6 to 16 languages.

Our designer wrote up a summary of how he thinks of our report:

A billboard shines upon the magical city of Anyville, USA and tells the tale of one remarkable blogger’s dedication and their achievements throughout the year that was. As 2015 edges closer the citizens of Anyville stop and look up in awe at the greatness that is you.

Its so incredible working with clever designers who know how to make data speak to users.

Here’s what I took out of the 100,000 views from 58,000 visitors my blog had in 2014:

  • Multi-lingual indexing is a pain point for a lot of Elasticsearch users (ourselves included). I’ve learned so much from the comments on that post.
  • Everyone really likes that two year old post about things I did wrong with ES. I should write another one. I’ve done lots of other things wrong since then. 🙂
  • I am very inconsistent about when I post. I should try writing shorter posts. Writing a series of posts was a good idea, but a bit overwhelming.
  • About half of my traffic is from search engines.
  • I get a lot of traffic from folks posting links to me on Stack Overflow. Makes me really happy to see my posts helping answer so many questions.

Click here to see the complete report.