Liveblogging Open Science Summit

I’m here at the Computer History Museum in Mountain View for Open Science Summit. This is my third year at the conference and it’s so great to see so many familiar faces. I’ll be talking about the developments in open access over the past few years and updating this page as the day progresses.

9:00 – The day starts off with Tyler Neylon recounting the story of the Cost of Knowledge petition. He’s drawing from a historical view to project into the future of open access. Tech ingredients creating the future: Github, Stack Overflow, reddit, Tech is great, but awareness needs to be the next step: Universities need to encourage researchers to publish openly, libraries need to educate researchers about how much access costs. Researchers need to stop giving free labor and content to closed access publishers. The authors need to know how many people are denied access to their work due to the costs. Awareness is the key to the next steps.

Tyler’s slides available here:

Richard Price from is saying that open access is as expensive as closed access, which just isn’t true. Pubmed Central costs $1M per year.

Next questioner asking about cost of publishing in open access, repeating the astroturfed line that OA publishing costs $3000/publication, all which has to be borne by the author. PLOS ONE is $1350, and most OA journals charge no author side fees at all.

Elizabeth Iorns from Science Exchange asking for Tyler to comment about the role of funders, given the NIH Public Access mandate. Tyler Neylon is pessimistic about funders being able to do everything.

9:45 – Juan Pablo Alperin from Open Journal Systems – This will be a different perspective on open access, because OJS journals are popular in South America, and outside of the sciences. 95% of publication in Latin America are open access! 74% of what is in Scopus. Three regional initiatives – Latindex, redalyc, SciELO – index the journals, some provide analytics and stats as well. Cites Brazil as a hotspot for Open Scholarship in Latin America. Argentina has a national self-archiving mandate, Brazil has one in the works. 15 countries working on supporting repositories. There remains some confusion about what exact definition of open access is used in some areas. Juan suggests that the main reason OA has been so big in Latin America is because there’s been so little commercial interest from major international publishers, but that may change. OA is seen as high quality in Latin America, and the openness of open access is seen as providing a service for researchers (because institutions don’t have big subscription budgets?) Open Access is for awareness – smaller institutions get the same access as bigger ones, levels the playing field, but it’s also about strengthening the research culture, because otherwise many smaller institutions would be participating at all. He does caution about the author pays model.

Questioner asking about decentralization vs. distribution of content. Less central hubs. Juan agrees, that is the aim, but centralized indexes like SciELO help provide easier discovery (a common challenge of distributed systems – WG).

10:08 – Richard Price ( (has a brief issue getting his mac to display properly on the projection equipment (seems like it always happens to some poor Mac user.))
10:17 – Richard Price opens by saying he’s working on “credibility metrics” for (deliberately not using #altmetrics as the name?) They have profile, paper, newsfeed stats. Done with pitch and now moving to 4 things he wants to use to change. He’s pitching the metrics to researchers directly, suggesting researchers should just start including metrics on their CV. (I do agree with this). I think Impact Story is a great source. Now talking about instant distribution of research via Academia as a publication platform, but they don’t have an API for fetching these stats, AFAIK. Another thing he wants for publishing is multimedia. Third thing is open access. Fourth thing he wants is better peer review. The main problem he sees is bias, lazyness, incompetence from peer reviewers. Suggests incentives for peer review would sort this out, but really wants crowdsourced peer review. (Wouldn’t this make the peer review issue worse, given self-selection of reviewers (see Wikipedia edit-cliques)?)

Live Blogging will pause while I prepare for my talk & we’ll have a short break.

Q&A – Tim McCormick asking why the site is if it’s really just all about science. Is that where the market is?
Rich: Yes, that’s where the money is.
Llewelyn Cox from USC asking how peer review will break out of the tendency to just promote popular stuff.
Rich: It should all be published, just some will get more attention than others. Uses VCs as an example, “they’re mostly sheep”, but some buck the trend.
I asked about dealing with edit cliques in a crowdsources peer review model, and he started talking about recursive authority, something along the lines of I also asked if the site engagement metrics will be available via API and he said they will be holding some metrics back as because they seem them possibly being a revenue source in the future, but they are planning to develop an API which will have limited functionality.

11:34 – David Jay of JournalLab. By 2020, Impact will be measured directly, Reputation will be clearly quantified, and discussion will be archived, organized, and freely available. David Jay did lots of interviews at UCSF Parnassus, about how systems of research were organized. Great comparison between offline and online journal discussion.
Offline you point at figures, online you highlight passages of text.
Offline you pay attention to who’s in the room, online you really can’t -it’s everyone and no one.

Journallab integrates with the email alert workflow to help researchers find papers. Figure-level summaries “improve reading speed by 5x”.

Commenting on journal articles online is a big problem, but it’s not a UI problem, it’s a social problem. We need to think like a social movement. Commenting online is a political action. transitioning to .org because many researchers are suspicious about the motives of a for-profit org. (No mention of the sustainability problem of grant-funded work?)

Q & A

Rich Price – We think commenting is a UI issue, PDFs are hard to work with. What’s your UI solution?
David – Move away from the PDF!

Tyler Neylon – What are you going to do about preservation?
David – We will make our data as broadly available as possible via Open APIs so there will be multiple sources for this.

Questioner asking about gaming. (Needing to address gaming is a good problem to have.)
David – We’ll address that when we get to bigger scale.

Questioner – How do we keep the small group dynamic online?
David – We want to try to connect people alongside preexisting dimensions of interest.

12:03 – Dan Whaley of

How do we bring fact-based, authoritative discussions online? For example, how do we break the nonsense media spin cycle around climate change (Dan’s original motivation)? is building an annotation layer for the web, to enable commentary and discussion of content. Citing Vannevar Bush, which is winning him friends among Computer History Museum staff, I’m sure. Stunning to hear the quotes about the Memex.

Glad he’s bringing up Rapgenius, who just got $15M from Andressen (builder of the first browser).

Dan’s issues with previous annotation systems:
No peer-review / reputation model
No way to link into parts of docs.
Poor cold-start stategies
Not standard compliant
Not open source
Not non-profit

Hypothesis framework has $520k from Sloan, will have following features:
Rep model
Robust inter & intra document anchors
Will solve cold-start issue & be able to scale
W3C annotation draft standard

Two components: Annotation layer & peer-review layer. Annotation layer is handles the things like Blog B points at Paper A. Peer review layer will rank/filter annotations.

Discovered that identity is a critical part of the reputation system – how expensive identities are to acquire.

Pseudonymous identity
threaded, direct reply approach to moderation
metamod of moderations (ala slashdot)
metamods are chosen by reputation and domain proximity (conceptual closeness)
users can annotate publicly and privately
will have follows, sharing, etc.

How to address signal/noise? metamoderation handles volume, identity handles squelching of off signals. Cold start will be handled by picking domains to start with, Government docs like bills, scientific papers. Sustainability will come via agreements with Internet Archive & (possibly) Common Crawl.

1:46 – After lunch session with Elizabeth Iorns of Science Exchange. Creating a Reproducibility Intiative.
Discussing the Reproducibility Initiative, a partnership of Science Exchange, PLOS, Figshare and Mendeley.
The literature does not correct itself. No correlation between number of times a result was reported and the actual reproducibility of the result, but reproducible results did seem to be more translatable.

Postdoc survey Q: “Have you ever published something that you weren’t really sure about?” 50% said yes.

Reproducibility Initiative has gotten buy in from researchers, core facilities, and publishers, assembled a great advisory board with all the thought leaders on the issue of reproducibility, and is getting buy in from the public (they’re the ones most affected by the lack of new drugs, etc).
Final missing element of the Initiative is funding – this isn’t a new technology and more senior folks at funding bodies still believe the literature corrects itself.

Questioner asks about moving reproduction to pre-publication. It could be a good way for publishers to make the case about their value – “all our work has been independently replicated.”

Rich asking about costs.
Elizabeth: So far it’s all industry which has submitted studies for replication (clearly they see the value). Replication will be cheaper because the work will be done by skilled core facilities, which can do work more cheaply.

2:05 – Joanne Kamens of Addgene, a plasmid bank. Making it easy to share plasmids among researchers.
Addgene is a non-profit “for benefit” company. They felt that being a non-profit was essential for their initial buy-in, because they were asking researchers to share plasmids. “We’re doing this for you, not us.”
20,000 plasmids stored from 1200 labs, 270 institutions. 250000 plasmids shipped, 45% outside of the US. Deposition is free, getting plasmids shipped to you isn’t.

The service Addgene really provides is that they solve the high-friction Material Transfer Agreement problem. Addgene’s founder created the problem because she ran into this issue. They have a electronic universal biological MTA – take it or leave it. Reagent availability facilitates reproduction of research, preserves and archives plasmids, takes admin burden of sharing away from researcher.

Addgene is collaborating with Michael J. Fox Foundation to assemble a collection of materials useful to Parkinson’s disease researchers. Addgene has two collections (~1000 plasmids) available to be sent to industry form academia. Has a few industry depositors as well.

Addgene is sponsoring BeHEARD Rare Disease Challenge.

I asked if the data about plasmid use is available & she said yes and that she wants to see plasmid re-use stats used as a impact metric.

2:31 Elizabeth Bartmess and Michael Cohn – the Reproducibility Project (distinct from the initiative).
46 replication studied in progress or completed since Nov 2011. News about the project spread mostly via word of mouth.
Goals: evaluate false positive rate, reproducibility, feasibility of open big projects, write a paper (of course), assemble archive or replication attempts. Found that community discussion created much better awareness of the problem.

Incentives – authorship on project reports, culture of mutual interest, partnership. It’s the broad community buy-in and involvement that has made this work.

Q&A – I asked why people wanted to get involved, they have done surveys and found that there was a lot of undiscovered interest and people were happy to get involved to change things. Very important that replicators know their replication will be taken seriously and not dismissed on technical grounds.

Elizabeth Iorns – how did you select articles?
Elizabeth B. – We picked top 3 journals and selected articles from within it. Replicators get in touch with people being replicated. Individual replications will be identified in the reports, but the aim of the Project isn’t to get irreproducible work stricken from the literature, just to understand the overall false positive / false negative rates.

2:53 – Jeff Spies – Open Science Framework (infrastructure for above Project) (Github for science, though he didn’t say that.)
Overall goal – Narrow the gap between Scientific Values and Scientific Practices. We think the solution lies in openness.
Incentive structure of academia tied to publications – this causes unexpected effects because pressure to publish causes focus on novelty and narrative as opposed to reporting everything and letting it be just exploratory.

Web app for collaborating, documenting, archiving, sharing, registering science. Jeff finds that much of sharing is just because it’s so hard to do, not because people actively don’t want to. (rings true to me, many complaints about possible scooping are actually complaints about being asked to do extra work)

OSF also makes site engagement metrics available in the webapp. OSF has built-in version control (backend git repo).

APIs – wants to connect out to other tools.

Jose-Maria Fernandez – Financial Innovation for good.
This is promising an answer to new ways to fund research openly and sustainably. Bringing investment into research through financial engineering. The basic idea is to securitize a drug portfolio such that given X compounds in Y clinical stage, you’d get a given return. He’s run some simulations which seem to show that this may work and has the Matlab code freely available.,


Iorns: The simulations only hold true if all the compounds are addressing independent targets, which isn’t really the case very often, so the effective risk distribution would be very different.
Jose: What i showed was simplified, but more detailed models doesn’t assume independence and still hold.

I asked if he could think of any data sources that could help de-risk the portfolio (obviously thinking of the Reproducibility Initiative), and he cited some clinical databases they used for their model estimates, mostly based on historical data.

Missed several questions here.

3:52 – Lindy Fishbourne – Thiel Foundation Breakout Labs – New Ways to Fund Innovation
These people are funding early-stage ideas and do some dilutive and some non-dilutive funding, focusing on people and technology outside of traditional institutions. Royalty capped at 3x total and take warrants to get some money coming back in from successful projects. 7-10 day turnaround, $250K max. 122 proposals in 9 months. 3Scan, Bell Bio, Modern Meadow, are some examples.

Alex Peake asks their interest in AI and software.
Lindy: Software is usually something we don’t fund because they can run leaner than biotech and attract standard funding.

Alex: Do you fund research projects?
Lindy: We fund companies and not projects, but send in a proposal.

Questioner asks if BOL is “doing DARPA’s job”.
Lindy: We’re doing philanthropy in the most libertarian way possible, giving directly to companies. Gov’t can come in and fund after we get a project started, researchers can get preliminary data for NIH applications, too.

4:17 – Microryza – Cindy Wu (biomedical research), Denny Luan (economics)
JOBS Act, CROWDFUND Act – entities can raise up to $1M without disclosure – has stimulated interest in this space.
Crowdfunding is great because it allows your to share the risk and return with a long tail of interested people through a mechanism that traditional sources won’t touch. Major research funding doesn’t fund below $50k, microryza wants to fit that space. Basically bringing patronage back by aggregating lots of little donations.
Microryza is different because… “content is what matters to people” (sounds like marketing speak – didn’t get it.) They do help researchers learn how to market their own work so their projects will get funded. (Since they take a cut, I wonder if they push people to raise more?) Building a licensing platform for shared data.

Example of crowdfunding – getting a paleontology museum director funds necessary to go on a dig.


Michael Cohn: UCSF signed a sweetheart deal with Indiegogo to take less indirect costs. Can you do the same?
Denny: We’ll try.

Elizabeth Bartmess: How do you change the reporting culture to get researchers to update more frequently.
Denny: We’re trying to educate them on how to communicate and market themselves so they get it out there.

5:07 – OK, that’s all I can cover for today. Hope this was useful for everyone. Here’s the livestream.