<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Research Remix</title>
	<atom:link href="http://researchremix.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://researchremix.wordpress.com</link>
	<description>Blogging about the science, engineering, and human factors of biomedical research data reuse</description>
	<lastBuildDate>Sat, 28 Jan 2012 05:42:16 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='researchremix.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Research Remix</title>
		<link>http://researchremix.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://researchremix.wordpress.com/osd.xml" title="Research Remix" />
	<atom:link rel='hub' href='http://researchremix.wordpress.com/?pushpress=hub'/>
		<item>
		<title>A view of the rights and responsibilities of the NSF wrt data</title>
		<link>http://researchremix.wordpress.com/2012/01/11/nsf-data-vision/</link>
		<comments>http://researchremix.wordpress.com/2012/01/11/nsf-data-vision/#comments</comments>
		<pubDate>Thu, 12 Jan 2012 04:04:17 +0000</pubDate>
		<dc:creator>Heather Piwowar</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://researchremix.wordpress.com/?p=920</guid>
		<description><![CDATA[The US government has asked for our thoughts on how the NSF and other federal agencies should disseminate research results. I come to the question with a passion for the effective use of research results, a commitment to practical solutions, and naivete about how the NSF actually works.  From this perspective I offer my current [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=researchremix.wordpress.com&amp;blog=1015265&amp;post=920&amp;subd=researchremix&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<div>
<p>The US government has <a href="https://plus.google.com/109377556796183035206/posts/irKDFRPxdqx">asked for our thoughts</a> on how the NSF and other federal agencies should disseminate research results.</p>
<p>I come to the question with a passion for the effective use of research results, a commitment to practical solutions, and naivete about how the NSF actually works.  From this perspective I offer my current rough thoughts on <strong>what the NSF &#8212; and all public science funders &#8212; should do with regard to dissemination of digital research results</strong> (= datasets, code, and publications, as we know them and in their future incarnations).  I wrote them down to help me think through my responses to the RFIs.</p>
<p>So far the number of <a href="http://friendfeed.com/researchremix/5548a8e4/thread-to-keep-track-of-publicly-available">online responses to the RFI for data </a>has been very low.  Answering open-ended questions is difficult.  Critiquing a straw man is easier, a lot more fun, and often just as revealing.  Where does the vision below differ from yours?  It is interesting, for example, to compare it to the vision articulated in <a href="https://docs.google.com/document/d/1QA1eGBynqh-yN0bo3_nYzD3d26nEhvuVPMUR2ffi17o/edit?hl=en_US">this response</a> &#8212; also from proponents of open research data.</p>
<p>Do you have strong opinions?  You can <a href="http://www.federalregister.gov/articles/2011/11/04/2011-28621/request-for-information-public-access-to-digital-data-resulting-from-federally-funded-scientific">respond to the White House till Jan 12th</a> and the National Science Board till Jan 18th.  Feel free to reuse text and ideas directly from here for agreement or critique.</p>
<p><strong>A vision for NSF infrastructure for digital research results</strong><br />
(considering digital research results to include datasets, code, and publications &#8212; as we know them and in their future incarnations)</p>
</div>
<div>
<p><strong>Principles</strong></p>
</div>
<div>
<ol>
<li>A public science funder has both a right and a responsibility to communicate its findings in the most generative form it can.  Projects funded with public money must be conducted under this premise.</li>
<li>Effective communication of research results requires strong statements of principles, enforceable policies, and useful infrastructure.</li>
<li>Individual disciplines and communities can opt-out of funder-wide approaches if they make a strong public case that the principles and goals are not applicable to their area, or that they plan to achieve the same goals in a different but equally-effective way.</li>
<li>Costs for disseminating research results in accordance with a funder’s requirements should be included and funded as part of the cost of doing a research project.  This money should be used by investigators to pay for publication services &#8212; including editing, registering, reviewing, certifying, hosting, publicising, and preserving &#8212; in a competitive marketplace.</li>
<li>Anticipated benefits from disseminating a project’s research results should be included in proposal evaluation.  Disseminating research results is the responsibility of the PI.  Research results that have not been disseminated in accordance with policy will not be acknowledged as output of the grant for the purposes of evaluation.  Measured impact of a research results, interpreted broadly, will be used for evaluation of the project and its investigators.</li>
<li>The intellectual property rights of researchers and institutions must be respected, but can not infringe on the rights of the public and the scientific community to replicate and build on funded research findings in a timely manner.</li>
<li>A science funder and its infrastructure for reporting research results must be nimble to stay abreast of changing norms, needs, and technologies.</li>
<li>The effects of adopted policies and infrastructure tools must be systematically monitored and adjusted accordingly.  Decisions should be informed by evidence whenever possible.  A funder should fund collection of the relevant, actionable evidence it needs to make decisions on policy and tools: research for more effective research.</li>
<li>Infrastructure should use existing commodity software when it meets the needs at a competitive price, with a preference for open source software.  When funders develop their own infrastructure software it should be open source to allow outside contributions, customizations, and tailored solutions.</li>
<li>Science communication infrastructure should be open to findings from all funders whenever incremental costs can be recovered.</li>
</ol>
<p><strong>Policies </strong>(to be enacted after 2 years notice)</p>
<ol>
<li>Immediate online open access to the published article-of-record and the data and code that support its findings.  Embargoes or exceptions may be granted at the discretion of the program officer, especially for sensitive information such as human subject data or the location of endangered species.</li>
<li>The openly available article-of-record, data, and code must be registered with the funder to be considered research findings of the grant.</li>
<li>Research results must be made openly available to any one, for any purpose, with the sole condition of appropriate attribution.</li>
<li>Articles, data, and code must be available for 50 (?) years after completion of the research project.</li>
<li>Publicly funded websites and software must report use of research results  through funder-compatible impact-tracking infrastructure.</li>
</ol>
<p><strong>Infrastructure</strong><br />
This is infrastructure that funders ought to obtain and directly and continuously fund.</p>
<ol>
<li><strong>Unique identifiers</strong>.  Unique IDs for each investigator, grant, institution, and licence variety.  Also unique IDs for each research result (see below).  Some initiatives are already underway.  There should be inter-funder and international coordination.</li>
<li><strong>An open registry of research results</strong>.  Investigators (or their publishers) must register research results here immediately upon article publication.  Data and code that support the article findings must be registered at the same time (or data and code may also be registered without of an associated article).<br />
At a minimum, a record includes fields for: research object type (article, data, code), research object ID, license, grant ID, investigator IDs, institution IDs, publisherID, an abstract, keywords about the topic and methods, metadata about embargos or exemptions, and links to other directly associated research object IDs.  The research object ID must resolve:</p>
<ol>
<li>for articles, to the full text of the article-of-record (in XML format or similar)</li>
<li>for data, to the full dataset and associated metadata</li>
<li>for code, to the snaphot version of the code</li>
<li>The registry must have a read/write API, and would ideally be extensible through third-party client-based add-ons to support discipline-based customization.</li>
</ol>
</li>
<li>A mechanism to gather and report <strong>raw data on the impact of research findings</strong>.  This could be done by adopting and hosting an open-source web analytics solution like Piwik (<a href="http://piwik.org">http://piwik.org</a>) and requiring that all funded initiatives report usage statistics for activity  (reading, commenting, remixing, etc) involving research results.<br />
Impact metrics would be also be populated by machine extraction of citations from the article text linked to through the open registry of research results, thereby providing Open access to citation data that the commercial sphere has not facilitated.  Funder customizations to the source should be remain open source to facilitate additional contributions and reuse.  The raw impact data should be made widely available to fuel innovative products in discovery and filtering of research reports, next generation bibliometrics assessment, and policy evaluation.</li>
</ol>
<p>Also:</p>
<ul>
<li>Dissemination and preservation of research results. Although successful discipline-wide models already exist for papers (journals and conferences) and some types of data, many disciplines lack appropriate data repositories and few disciplines have determined appropriate hosting and archiving solutions for code.  Research funders should provide seed funding for such infrastructure until it can survive as a sustainable service like those discussed below.</li>
</ul>
<p>It is assumed that scholarly societies, non-profits, and the commercial sector will continue to offer value-add services such as editing, mark-up, layout, organizing pre-publication peer-review, certification, dissemination, and archiving.  These are envisioned as services that would be paied for with line-item funds from the project research budget for publications, data, and code.  It is unlikely that funders will need to fund infrastructure in this services sector, other than occasional seed funding for innovative approaches.</p>
<p><strong>Money</strong><br />
The NSF should compute the subscription fees paid by institutional libraries to gain access to NSF-funded research today (for example, total subscription fees X (number of articles/number of articles reporting on NSF funded research projects)).</p>
<p>Starting two years from now (synchronized with the requirement that all final research products be openly accessible, and the expectation that all research projects include budget items for publication services) it should decrease the indirect costs by this amount.</p>
<p><strong>Not included in the articulation of this vision, mostly due to time:</strong></p>
<ul>
<li>Is it worth it to archive Very Large Data?  Dryad can archive up to 10Gb per paper&#8230;. this covers the data that supports findings for most investigator-driven research, but not all.  I don&#8217;t know what to do with the really big stuff.  GigaScience model?</li>
<li>How this ought to sync up with availability of research proposals.</li>
<li>Training.</li>
<li>Advocating for challenges and prizes.</li>
</ul>
<p><strong>Other notes:</strong></p>
<ul>
<li>goal isn’t availability, it is use</li>
<li>focus on both replication *and* reuse.  Neither is sufficient alone.</li>
<li>focus here on data associated with final research findings.  Much to suggest that other datasets should be made available for reuse as well!</li>
<li>lots more that needs to be done about annotating the <em>kind</em> of contribution a citation makes to a research result.</li>
<li>is 2 years the right amount of time before goes into effect?  50 for preservation?  I don’t know, but they are the right order of magnitude.</li>
<li>embargos should be used to permit exclusive use by investigators when appropriate during this transition period (particularly for longitudinal datasets that take years to collect),  active IP explorations, etc.  The length of the embargo should be as short as is reasonable and must be specified at the time of research reporting.</li>
</ul>
</div>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/researchremix.wordpress.com/920/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/researchremix.wordpress.com/920/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/researchremix.wordpress.com/920/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/researchremix.wordpress.com/920/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/researchremix.wordpress.com/920/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/researchremix.wordpress.com/920/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/researchremix.wordpress.com/920/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/researchremix.wordpress.com/920/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/researchremix.wordpress.com/920/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/researchremix.wordpress.com/920/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/researchremix.wordpress.com/920/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/researchremix.wordpress.com/920/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/researchremix.wordpress.com/920/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/researchremix.wordpress.com/920/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=researchremix.wordpress.com&amp;blog=1015265&amp;post=920&amp;subd=researchremix&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://researchremix.wordpress.com/2012/01/11/nsf-data-vision/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/ccf6fc7425a6f5e941fc043a2d069b86?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Heather</media:title>
		</media:content>
	</item>
		<item>
		<title>Research Works Act attacks data dissemination too</title>
		<link>http://researchremix.wordpress.com/2012/01/07/rwa-data/</link>
		<comments>http://researchremix.wordpress.com/2012/01/07/rwa-data/#comments</comments>
		<pubDate>Sun, 08 Jan 2012 06:23:42 +0000</pubDate>
		<dc:creator>Heather Piwowar</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://researchremix.wordpress.com/?p=910</guid>
		<description><![CDATA[Sponsors and supporters of the Research Works Act keep claiming that it doesn&#8217;t cover &#8220;the raw data generated by government-funded research&#8221; [1] or &#8220;raw data outputs&#8221; [2]. That&#8217;s not what I get from a direct reading of the bill.   The relevant sentence reads: &#8220;Such term does not include progress reports or raw data outputs [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=researchremix.wordpress.com&amp;blog=1015265&amp;post=910&amp;subd=researchremix&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Sponsors and supporters of the Research Works Act keep claiming that it doesn&#8217;t cover &#8220;the raw data generated by government-funded research&#8221; [<a href="http://www.michaeleisen.org/blog/?p=807#comment-52048">1</a>] or &#8220;raw data outputs&#8221; [<a href="https://twitter.com/#!/DarrellIssa/status/155023586040627200">2</a>]. That&#8217;s not what I get from a <a href="http://www.gpo.gov/fdsys/pkg/BILLS-112hr3699ih/pdf/BILLS-112hr3699ih.pdf">direct reading of the bill</a>.   The relevant sentence reads:</p>
<p style="padding-left:30px;">&#8220;Such term does not include progress reports or raw data outputs routinely required to be created for and submitted directly to a funding agency in the course of research.&#8221;</p>
<p>This doesn&#8217;t say it excludes raw data.  It says some complicated thing about it excluding raw data that is routinely required to be submitted directly to the funding agency. There aren&#8217;t many NIH or NSF program officers I know who want me to routinely sent them all my raw data, and many data repositories are not hosted at funding agencies.</p>
<p>This means that practically all &#8220;published&#8221; research datasets (including those in tables, supplementary information, and presumably non-federal data archives) are subject to the Research Works Act.  On purpose or because of poor sentence structure?  I don&#8217;t know, but the effect is the same.</p>
<p>Read the bill again with data in mind <em>[comments added]:</em></p>
<p style="padding-left:30px;">No Federal agency may adopt, implement, maintain, continue, or otherwise engage in any policy, program, or other activity that–</p>
<p style="padding-left:30px;">(1) causes, permits, or authorizes network dissemination of any private-sector research work <em>[government-funded investigator dataset]</em> without the prior consent of the publisher of such work <em>[who would be considered publishers? publishers of the articles that describe the data collection? data archives too?]</em>; or</p>
<p style="padding-left:30px;">(2) requires that any actual or prospective author, or the employer of such an actual or prospective author, assent to network dissemination<em> ["distributing, making available, or otherwise offering or disseminating a private-sector research work through the Internet or by a closed, limited, or other digital or electronic network or arrangement"]</em> of a private-sector research work <em>[government-funded investigator dataset].</em></p>
<p>Insane.  The government can&#8217;t *permit* (let alone require) the *making available* of data *on a closed digital network* (let alone the internet) without the publisher&#8217;s approval?  Data they funded collecting?</p>
<p>ok I think writing that down has disgusted me so much that I&#8217;m done thinking about the Research Works Act for now.  It is just insulting.</p>
<p>Stand up against the AAP and everyone who supports this.</p>
<p>[1] <a href="http://www.michaeleisen.org/blog/?p=807#comment-52048">http://www.michaeleisen.org/blog/?p=807#comment-52048</a></p>
<p>[2] <a href="https://twitter.com/#!/DarrellIssa/status/155023586040627200">https://twitter.com/#!/DarrellIssa/status/155023586040627200</a></p>
<p>&nbsp;</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/researchremix.wordpress.com/910/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/researchremix.wordpress.com/910/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/researchremix.wordpress.com/910/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/researchremix.wordpress.com/910/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/researchremix.wordpress.com/910/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/researchremix.wordpress.com/910/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/researchremix.wordpress.com/910/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/researchremix.wordpress.com/910/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/researchremix.wordpress.com/910/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/researchremix.wordpress.com/910/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/researchremix.wordpress.com/910/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/researchremix.wordpress.com/910/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/researchremix.wordpress.com/910/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/researchremix.wordpress.com/910/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=researchremix.wordpress.com&amp;blog=1015265&amp;post=910&amp;subd=researchremix&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://researchremix.wordpress.com/2012/01/07/rwa-data/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/ccf6fc7425a6f5e941fc043a2d069b86?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Heather</media:title>
		</media:content>
	</item>
		<item>
		<title>What *should* the publishers lobby for?</title>
		<link>http://researchremix.wordpress.com/2012/01/07/what-should-the-publishers-lobby-for/</link>
		<comments>http://researchremix.wordpress.com/2012/01/07/what-should-the-publishers-lobby-for/#comments</comments>
		<pubDate>Sun, 08 Jan 2012 04:41:16 +0000</pubDate>
		<dc:creator>Heather Piwowar</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://researchremix.wordpress.com/?p=905</guid>
		<description><![CDATA[The Research Works Act is a very poor move by traditional publishers. Publishers will come out strongest if they side with the future: the future is immediately open, reusable, all-value-added versions of research results and peer reviewed publications. Sure, they should lobby to shape government policy to their favour.  That&#8217;s their right and maybe even [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=researchremix.wordpress.com&amp;blog=1015265&amp;post=905&amp;subd=researchremix&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>The Research Works Act is a very poor move by traditional publishers.</p>
<p>Publishers will come out strongest if they side <strong>with</strong> the future: the future is immediately open, reusable, all-value-added versions of research results and peer reviewed publications.</p>
<p>Sure, they should lobby to shape government policy to their favour.  That&#8217;s their right and maybe even their responsibility to their stakeholders.</p>
<p>What *should* traditional publishers fight for, to stay on the right side of both history and their balance sheets?</p>
<ol>
<li>Time.  They should insist that any federal mandate that requires the article-of-record be made openly, immediately available does not take effect for a year or two, to give themselves time to change their business models (to author/funder pay-on-publish or pay-on-submit or some other method, thereby saving their companies <a href="http://researchremix.wordpress.com/2012/01/07/rwa-job-losses/">and jobs</a>).</li>
<li>Access to publication funds for federally-funded authors.  Publication costs are already available to NSF and NIH awardees as budget line items in grants.  I don&#8217;t know if all other federally-funded investigators have access to author-pays grant money&#8230; if not, publishers should argue that access to these resources must be a condition of a mandate.  There must be a creative way to redirect money which payed for university library subscriptions into university OA funds or federally-disseminated research distribution reimbursement (has anyone proposed such an approach yet?)&#8230;. publishers should lobby for this.</li>
<li>Measurement of the impact of the papers they publish.  When research papers are openly distributed, redistributed, deconstructed, and mashed up it becomes much harder for publishers to understand (and therefore brag about, and then capitalize on with higher publication charges) the impact their publications have had vis-a-vis their competitors.  Publishers could insist that all federal hosting services report back usage stats (as PubMed Central does), and lobby for requiring a manner of attribution that facilitates easy and robust impact tracking (beyond just mention or citation).</li>
</ol>
<p>I&#8217;m not saying I think they should get all of these things, necessarily&#8230; but this would be a much more constructive, productive and supportable stance.</p>
<p>What do you think they should fight for?</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/researchremix.wordpress.com/905/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/researchremix.wordpress.com/905/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/researchremix.wordpress.com/905/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/researchremix.wordpress.com/905/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/researchremix.wordpress.com/905/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/researchremix.wordpress.com/905/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/researchremix.wordpress.com/905/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/researchremix.wordpress.com/905/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/researchremix.wordpress.com/905/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/researchremix.wordpress.com/905/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/researchremix.wordpress.com/905/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/researchremix.wordpress.com/905/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/researchremix.wordpress.com/905/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/researchremix.wordpress.com/905/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=researchremix.wordpress.com&amp;blog=1015265&amp;post=905&amp;subd=researchremix&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://researchremix.wordpress.com/2012/01/07/what-should-the-publishers-lobby-for/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/ccf6fc7425a6f5e941fc043a2d069b86?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Heather</media:title>
		</media:content>
	</item>
		<item>
		<title>Threat of job loss as motivation for Research Works Act: real or fear-mongering?</title>
		<link>http://researchremix.wordpress.com/2012/01/07/rwa-job-losses/</link>
		<comments>http://researchremix.wordpress.com/2012/01/07/rwa-job-losses/#comments</comments>
		<pubDate>Sat, 07 Jan 2012 22:21:41 +0000</pubDate>
		<dc:creator>Heather Piwowar</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://researchremix.wordpress.com/?p=894</guid>
		<description><![CDATA[I find it difficult to silently read the AAP endorsement of the Research Works Act.  Multiple readings later I still find myself yelling aloud at its twisted positions and statements. Most of my rebuttals take the form of expletive + &#8220;no, that isn&#8217;t true&#8221; or &#8220;give me a break&#8221; + a silent prayer that policymakers won&#8217;t take the positions seriously. But [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=researchremix.wordpress.com&amp;blog=1015265&amp;post=894&amp;subd=researchremix&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I find it difficult to silently read the <a href="http://www.publishers.org/press/56/">AAP endorsement of the Research Works Act</a>.  Multiple readings later I still find myself yelling aloud at its twisted positions and statements.</p>
<p>Most of my rebuttals take the form of expletive + &#8220;no, that isn&#8217;t true&#8221; or &#8220;give me a break&#8221; + a silent prayer that policymakers won&#8217;t take the positions seriously.</p>
<p>But what about the AAP threat that the industry and 30,000 jobs are in danger if the government supports, requests, or requires more open scholarship?  I suspect many politicians in the US government will take that very seriously.  Threatening job losses in the US right now is a Great Card, and publishers <a href="http://www.michaeleisen.org/blog/?p=807&amp;cpage=1#comment-52048">keep bringing it to the table</a>.</p>
<p>The Jobs Card <a href="http://researchremix.wordpress.com/2011/08/17/oa-doesnt-cost-jobs-it-creates-them-and-saves-lives/">makes me furious</a> because *if* scholarship can be done better it *should* be done better, full stop.  But with the government really considering about these issues and soliciting feedback, it is time to get beyond that and see if, you know, it happens to be true that 30,000 US jobs are in jeopardy.  If so, we need to figure out how to address that in our discussions, because we can be darn sure the government will balance that against the benefits of openness.</p>
<p>First, what is at risk?  Scholarly publishing as an industry isn&#8217;t going away any time soon: physicists have arXiv and they still publish in journals.  What is at risk is the traditional subscription-based business model.  If the scholarly article-of-record is made immediately available with no restrictions, there is no reason for anyone to pay subscription fees.  Luckily, BMC and PLoS have already demonstrated that an author/author&#8217;s funder-pays model can work (at least for fields funded by federal money, with access to publication budget line items&#8230; the same fields that would be subject to federal mandates on openness).  Traditional publishers could move to this model if they wanted to.  So what is at risk is the business model, not the industry.</p>
<p>Second, what about the jobs?  Well, the publishing industry would still need to employ people, even if it changes to a PLoS business model. Maybe fewer people, though, since PLoS has been designed from the ground up to be a streamlined operation?</p>
<p><em>Queue the <a href="https://twitter.com/#!/researchremix/status/155713220848525312">Saturday-morning kitchen-table</a> back of envelope analysis (ok, post-its).</em><br />
<em>*** btw if others have more refined or robust estimates of the job situation, please share!</em></p>
<p>In 2010, the first year PLoS reported covering its costs with revenue, PLoS published <a href="http://www.ncbi.nlm.nih.gov/pubmed?term=(%22PloS%20one%22%5BJournal%5D%20OR%20%22plos%20biology%22%5Bjournal%5D%20OR%20%22plos%20medicine%22%5Bjournal%5D%20OR%20%22plos%20computational%20biology%22%5Bjournal%5D%20OR%20%22plos%20pathogens%22%5BAll%20Fields%5D%20OR%20%22plos%20genetics%22%5Bjournal%5D%20or%20%22plos%20neglected%20diseases%22%5Bjournal%5D)%20AND%20(%222010%22%5BPDAT%5D%20%3A%20%222010%22%5BPDAT%5D)">8662</a> papers.  PLoS has about <a href="http://blog.ynada.com/tag/plos">120 employees</a>.  That&#8217;s <strong>72 papers/employee</strong>.</p>
<p>Let&#8217;s compare that to traditional publishing models.  Elsevier publishes <a href="http://www.elsevier.com/wps/find/authorsview.authors/landing_main">250,000</a> papers per year.  Elsevier has <a href="http://www.isdn-conference.elsevier.com/about-elsevier.html">7000</a> employees worldwide.  That&#8217;s 35 papers/employee.  However, Elsevier also works on <a href="http://www.isdn-conference.elsevier.com/about-elsevier.html">non-article products and services</a>, like book publishing, SciVerse software, Gray&#8217;s Anatomy, MD Consult, and conference organizing.  I have no idea what proportion of their employees work on these things.  If 20%, the number of employees working on articles is 5600.  So 250,000/5600 = <strong>45 papers/employee.</strong></p>
<p>What does this mean?</p>
<p>At 45 papers/employee, publishing 250,000 papers at a PLoS-level of efficiency would take 250,000/72=~3500 employees.  We estimated that Elsevier has 5600 worldwide employees working on research articles now, so 2100 more than would be necessary at PLoS levels of efficiency.  Would these people have to be let go?  Heck no!  Publishers often argue that they offer a better product than the bare-bones PLoS offering &#8212; those 2100 people are doing something useful &#8212; so they could charge more money for their product than PLoS!  Compete!  Tout your high impactness and careful copyediting!  <strong>$500 more per article would do it</strong> (250000*500/2100=$60k).  Totally reasonable, and <a href="http://www.lib.berkeley.edu/scholarlycommunication/oa_fees.html">much less than many existing OA fees</a>.</p>
<p>Let&#8217;s think about how many jobs it really is in the USA.  <a href="http://elpub.scix.net/data/works/att/178_elpub2008.content.pdf">Björk et al</a> estimates there were about 1.35 million articles are published per year in 2008: let&#8217;s call it 1.5 for 2010.  1.5 million papers/72 employees per paper=20,100 scholarly communication jobs would be required to publish that many papers, at a PLoS levels of efficiency.  1.5 million papers/45=33,333 jobs would be required at an Elsevier level of efficiency.  The difference is 13k jobs worldwide.  Worldwide.  AAP claims that 45% of worldwide scholarly publishing is by North American science publishers.  Let&#8217;s say US science+non-science publications is 50%, maybe?  50% of 13k jobs =~ 7,000 jobs.  I reiterate that these jobs do NOT have to be lost.  Presumably these employees are indeed creating value (through more thorough copyediting or peer-review routing or paper-article-publishing or whathaveyou) then the publishers could (and should) CHARGE EXTRA and pay for these jobs that create this added value.</p>
<p>But let&#8217;s give a bit of perspective to 7,000 jobs.  Google &#8220;jobs lost USA.&#8221;  This was the top hit when I did it: <a href="http://www.msnbc.msn.com/id/45710876/ns/health/t/hmc-hospitals-close-nearly-jobs-lost/#.Twi8biNzrvM">nearly 1000 people out of work because a hospital is closin</a>g.</p>
<p>Enough said.</p>
<p><strong>Publishers are fear mongering with talk of loss of jobs</strong>.   As far as I can tell, there is no big risk, if publishers are willing to move with the times and embrace new business models.</p>
<p>The US government has no business sacrificing the progress of science and taxpayer access to maintain a particular business model.</p>
<p>The opportunity cost, however, of continuing with the current pathetic dissemination of research results?  Don&#8217;t get me started.</p>
<p><em>Go set your <a href="http://www.arl.org/sparc/media/blog/12-0106.shtml">government official straight about the Research Works Act</a> and <a href="http://www.whitehouse.gov/blog/2011/11/07/request-information-public-access-digital-data-and-scientific-publications">Public Access in general (due Jan 12)</a>!   Feel free to use analysis in this post with or without attribution.  </em></p>
<p><em>If you know of other sources of evidence and analysis around this issue, on either side, please let me know.  </em></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/researchremix.wordpress.com/894/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/researchremix.wordpress.com/894/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/researchremix.wordpress.com/894/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/researchremix.wordpress.com/894/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/researchremix.wordpress.com/894/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/researchremix.wordpress.com/894/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/researchremix.wordpress.com/894/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/researchremix.wordpress.com/894/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/researchremix.wordpress.com/894/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/researchremix.wordpress.com/894/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/researchremix.wordpress.com/894/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/researchremix.wordpress.com/894/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/researchremix.wordpress.com/894/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/researchremix.wordpress.com/894/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=researchremix.wordpress.com&amp;blog=1015265&amp;post=894&amp;subd=researchremix&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://researchremix.wordpress.com/2012/01/07/rwa-job-losses/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/ccf6fc7425a6f5e941fc043a2d069b86?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Heather</media:title>
		</media:content>
	</item>
		<item>
		<title>Proposal inviting Citizen Scientists to enrich the scientific literature</title>
		<link>http://researchremix.wordpress.com/2011/12/16/proposal-citizen-science/</link>
		<comments>http://researchremix.wordpress.com/2011/12/16/proposal-citizen-science/#comments</comments>
		<pubDate>Fri, 16 Dec 2011 14:00:25 +0000</pubDate>
		<dc:creator>Heather Piwowar</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://researchremix.wordpress.com/?p=892</guid>
		<description><![CDATA[I submitted a proposal to the Citizen Science Alliance yesterday.  It is really exciting!  I&#8217;d briefly discussed the idea of  classify-citations-to-find-data-reuse with members of Zooniverse previously:  they suggested that I keep the scope small to start and focus on an area of interest to the public, like cancer.  Thanks to Todd Vision for great feedback [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=researchremix.wordpress.com&amp;blog=1015265&amp;post=892&amp;subd=researchremix&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I submitted a <a href="http://www.citizensciencealliance.org/proposals.html">proposal to the Citizen Science Alliance</a> yesterday.  It is really exciting!  I&#8217;d briefly discussed the idea of  classify-citations-to-find-data-reuse with members of <a href="https://www.zooniverse.org/">Zooniverse</a> previously:  they suggested that I keep the scope small to start and focus on an area of interest to the public, like cancer.  Thanks to Todd Vision for great feedback on an last-minute draft.</p>
<p>Here are the main bits of the proposal.  Feedback welcome!</p>
<div><strong>Project title</strong><br />
Tracking the building blocks of cancer research</p>
<p><strong>Abstract</strong><br />
Funders and charities spend a lot of money on cancer research.  Are we getting as much research progress we can get from this funding?  How is cancer research used to make additional discoveries?  Are there lessons we can learn from current behaviour to improve research efficiency in the future?</p>
<p>These questions are currently hard to answer because it difficult to identify which resources have contributed to new research.  This project will invite volunteers to enrich the scientific literature by classifying literature citations to previous work.  Researchers cite previous work for many reasons: only through examining citation context can we differentiate citations made to reference background information from citations that attribute the reuse of materials, methods, software, and datasets.</p>
<p>We propose to begin attribution tracking in a specific domain: cancer-related gene expression microarray datasets.  By categorizing citations to papers that describe such datasets we will begin to understand how often these datasets are used by others, how scientists attribute reuse of these research building blocks, and what contribution the data has made to research progress.  We hope these findings will help make similar information more easily discoverable in the future.</p>
<p>This is a great chance for citizen scientists to begin enriching the scientific literature.  Subscription and licencing restrictions make most of the scientific literature off limits to automated markup.  Human-aided text mining of freely available full text has the potential to extract a coherent and incredibly useful body of knowledge from literature that is otherwise unavailable for large-scale markup.</p></div>
<div></div>
<div><strong>Describe the tasks you envisage citizen science participants carrying out – including (a) any required specialist knowledge, (b) minimum requirements for success, (c) desired outcomes.</strong></p>
<p>Volunteers will be given a link to an article in PubMed Central and text that identifies a specific item in that article’s References section.  They will be asked to find all mentions of the Reference item in the article full text, record the section, extract the sentences surrounding the citation, and classify the citation context into one of a few broad categories (cited for background information, to attribute use of method, to attribute use of data, etc).  Finally, the volunteers will be asked to make an assessment about whether the cited resource played a significant role in the conduct of the reported research project.</p>
<p>We think questions can be phrased so as not to require specialist knowledge.</p>
<p>Minimum requirements for success would be annotation of 1000 citations since about 10% are likely in context of data reuse. Annotation of all citations would allow a more through analysis of patterns.</p>
<p><strong>Describe the nature of the data that would be used in the proposed project &#8211; include (a) its format (filetype, size, number of files) (b) any restrictions (including copyright) on its use (c) its availability (is it archive data or still being collected).</strong></p>
<p>The input data is simply a list of pairs: a URL that points to a paper in PubMed Central paired with a text string that identifies (authors, title, year, journal) a reference item known to be cited within the PubMed Central paper.</p>
<p>The URLs and citation text would be derived from a dataset currently hosted on the Dryad repository:<br />
Piwowar HA (2011) Data from: Who shares? Who doesn’t? Factors associated with openly archiving raw research data. Dryad Digital Repository. <a href="http://dx.doi.org/10.5061/dryad.mf1sd">doi:10.5061/dryad.mf1sd</a></p>
<p>Specifically, PubMed IDs for cancer gene expression microarray studies published in 2005 (n=792) would be extracted from the abve dataset.  An equal number of studies that made their data publicly available (n=129) and studies without publicly available datasets would be retained (total n=258, http://tinyurl.com/7mkbpyq).  PubMed Central reports hosting 3540 papers that cite these 258 studies.  The 3540 citing papers would be randomized and a subset of 1000 would be extracted to achieve our initial analysis goals.</p>
<p>Because it may be against <a href="http://www.ncbi.nlm.nih.gov/pmc/about/faq/#q2008sep24a">PubMed Central terms of use</a> to embed the PMC article in a web page, we envision Zooniverse supplying a link that opens in a new tab and the corresponding Reference-Entry-To-Classify text to each volunteer.</p>
<p><strong>What automatic processing routines exist which attempt to solve the problem being addressed? Why can’t they be used instead of humans?</strong></p>
<p>Automated processing schemes have been developed to classify citation context (e.g. Teufel et al. Automatic classification of citation function.<a href="http://www.informatik.uni-trier.de/~ley/db/conf/emnlp/emnlp2006.html%23TeufelST06"> EMNLP 2006</a>).  It is not known how accurate these algorithms are for the specific task of identifying data reuse attribution.</p>
<p>The primary hurdle to automated processing is legal: publishers rarely allow full text to be harvested or used for text mining.</p>
<p>This proposal suggests an approach to work around these access and use limitations: leverage citizen scientists and publicly available research papers to gain large scale access to the scientific literature.</p>
<p>PubMed lists 804184 publications from 2009 with links to full text.  Of these, 247421 (31%) have free full text, available for public view.  Only a small subset of these, about 67000 (8%), are open access with full text that can be systematically downloaded and used for text mining.  [<a href="http://researchremix.wordpress.com/2011/12/15/computing-availability-of-full-text-for-reuse/">http://researchremix.wordpress.com/2011/12/15/computing-availability-of-full-text-for-reuse/</a>]</p>
<p><strong>If possible, estimate the minimum number of times a task must be performed on a given element of data to be useful for science (assuming all tasks are performed by competent citizen scientists; once might be enough for exceptionally clear tasks, more times could be required for fuzzier tasks or lots may be necessary if accurate estimates of uncertainties are needed). How many total tasks must be completed before your research goals are achievable?</strong></p>
<p>We believe three replicates would be sufficient, but a bit of experimentation may be needed to understand how many classifications are needed to achieve sufficient accuracy.  A master’s student with a bachelor degree’s in forestry was able to complete the task accurately with little training.  Five replicates achieved the necessary generalizability when we asked people on Mechanical Turk to complete a more complex task based on the same sort of papers in the biomedical literature (details:  <a href="http://researchremix.wordpress.com/2008/12/29/generalizability-coefficient-for-mechanical-turk-annotations/">http://researchremix.wordpress.com/2008/12/29/generalizability-coefficient-for-mechanical-turk-annotations/</a>).</p>
<p>Assuming three replicates would be sufficient, 1000 citations would require completion of 3000 tasks.</p></div>
<div>
<div>All data from Zooniverse projects must be eventually made public.</p>
<p><strong>Are there potential extensions to the project that you have in mind?</strong></p>
<p>Yes!  We are excited about extensions to this project in at least three dimensions:<br />
1.  Data reuse estimates.  Investigating instances of data reuse for additional years, domains and datatypes, to understand how patterns differ.<br />
2.  Open citations and citation context.  The proposed project could be the first step in creating a repository of openly available citation information, ideally with semantic metadata.  This would be of broad interest.  As one concrete example, citation slices and dices could be included as measures of impact in <a href="http://total-impact.org/">http://total-impact.org</a>.<br />
3.  Toll-access literature.  Experiment with applying this approach to subscription-based literature, either by negotiating with publishers or by leveraging the subset of volunteers with university affiliations and subscriptions.</p>
<p>Even more profoundly, this mechanism for enriching the scientific literature will be of deep interest to a wide variety of researchers for all sorts of additional purposes.</p></div>
</div>
<div></div>
<div>And fun for a lot of citizen scientists too, I think!  :)</div>
<div></div>
<div></div>
<div>
<div></div>
</div>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/researchremix.wordpress.com/892/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/researchremix.wordpress.com/892/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/researchremix.wordpress.com/892/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/researchremix.wordpress.com/892/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/researchremix.wordpress.com/892/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/researchremix.wordpress.com/892/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/researchremix.wordpress.com/892/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/researchremix.wordpress.com/892/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/researchremix.wordpress.com/892/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/researchremix.wordpress.com/892/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/researchremix.wordpress.com/892/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/researchremix.wordpress.com/892/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/researchremix.wordpress.com/892/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/researchremix.wordpress.com/892/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=researchremix.wordpress.com&amp;blog=1015265&amp;post=892&amp;subd=researchremix&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://researchremix.wordpress.com/2011/12/16/proposal-citizen-science/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/ccf6fc7425a6f5e941fc043a2d069b86?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Heather</media:title>
		</media:content>
	</item>
		<item>
		<title>Computing availability of full text for reuse</title>
		<link>http://researchremix.wordpress.com/2011/12/15/text-for-reuse/</link>
		<comments>http://researchremix.wordpress.com/2011/12/15/text-for-reuse/#comments</comments>
		<pubDate>Thu, 15 Dec 2011 15:10:56 +0000</pubDate>
		<dc:creator>Heather Piwowar</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://researchremix.wordpress.com/2011/12/15/computing-availability-of-full-text-for-reuse/</guid>
		<description><![CDATA[Rough estimate: PubMed lists 804184 publications from 2009 with links to full text.  Of these, 247421 (31%) have free full text, available for public view.  Only a small subset, about 67000 (8% of all publications), are open access with full text that can be systematically downloaded and used for text mining. I&#8217;ll show how I [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=researchremix.wordpress.com&amp;blog=1015265&amp;post=881&amp;subd=researchremix&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Rough estimate:</p>
<p>PubMed lists 804184 publications <strong>from 2009</strong> with links to full text.  Of these, 247421 <strong>(31%) have free full text, available for public view</strong>.  Only a small subset, about 67000<strong> (8% of all publications), are open access with full text that can be systematically downloaded and used for text mining</strong>.</p>
<p>I&#8217;ll show how I got these numbers for future reference:</p>
<p>First, get all publications from 2009 with links to full text using this query in PubMed:</p>
<div style="padding-left:30px;">&#8220;loattrfull text&#8221;[sb] AND (&#8220;2009&#8243;[PDAT] : &#8220;2009&#8243;[PDAT])</div>
<div style="padding-left:30px;">(<a href="http://www.ncbi.nlm.nih.gov/pubmed?term=%22loattrfull%20text%22%5Bsb%5D%20AND%20%28%222009%22%5BPDAT%5D%20%3A%20%222009%22%5BPDAT%5D%29&amp;cmd=DetailsSearch">direct url</a>)  returns <strong>804184 </strong>results</div>
<div style="padding-left:30px;"></div>
<p>Next limit these to publications with links to *free* full text using this query in PubMed:</p>
<div style="padding-left:30px;">&#8220;loattrfree full text&#8221;[sb] (&#8220;2009&#8243;[PDAT] : &#8220;2009&#8243;[PDAT])</div>
<div style="padding-left:30px;">(<a href="http://www.ncbi.nlm.nih.gov/pubmed?term=%22loattrfree%20full%20text%22%5Bsb%5D%20%28%222009%22%5BPDAT%5D%20%3A%20%222009%22%5BPDAT%5D%29&amp;cmd=DetailsSearch">direct url</a>)  returns <strong>247421 </strong>results</div>
<div style="padding-left:30px;"></div>
<p>Finally we want to identify which of these are open access.  This is a bit tricky because as far as I know this filter is not available in PubMed.  It is, however, available in PubMed Central.  So:</p>
<ol>
<li>Start with this query in PubMed as above:<br />
&#8220;loattrfull text&#8221;[sb] AND (&#8220;2009&#8243;[PDAT] : &#8220;2009&#8243;[PDAT])<br />
(<a href="http://www.ncbi.nlm.nih.gov/pubmed?term=%22loattrfull%20text%22%5Bsb%5D%20AND%20%28%222009%22%5BPDAT%5D%20%3A%20%222009%22%5BPDAT%5D%29&amp;cmd=DetailsSearch">direct url</a>)</li>
<li>On the right is a menu that says &#8220;Filter your results&#8221; Under that one of the options is<br />
<span style="text-decoration:underline;">Links to PMC (158729)<br />
</span>Click this.  This will show, in PubMed, all 158729 articles that have records in PMC.  (Note there are quite a few papers will free full text that aren&#8217;t in PubMed Central, comparing this number to 247421)</li>
<li>Now we want to see these articles within the PMC interface rather than the PubMed interface.  To do this, have a look at the &#8220;Find related data&#8221; menu a bit lower down on the right.<br />
For Database, select PMC.<br />
In Option, select Free in PMC<br />
then click Find Items.<br />
This will show the same articles but within the PMC interface.  Or rather, it shows the first 10000 of the articles.</li>
<li>Have a look at the right menu now.  It has a link that says<br />
<span style="text-decoration:underline;">Open Access (4209)<br />
</span>That is how many of the 10000 articles are available as Open Access articles, as far as PubMed Central knows.</li>
<li>To finish, we need to extrapolate 4209 back to the full set, because it only represents the first 10000 articles.  Assuming that the 158729 articles have the same breakdown of OA/non-OA (a safe assumption?  could definitely do a bit more digging to be sure), we estimate that (158729/10000)*4209=<strong>66809</strong><br />
of the articles are available as Open Access.</li>
</ol>
<div></div>
<div>Other filters can obviously be ANDed to each step to see this ratio in specific topic area.  (There were a bunch of such calculations done by others a few years ago but I can&#8217;t easily find them on the web now.  Anyone have related links?)</div>
<div></div>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/researchremix.wordpress.com/881/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/researchremix.wordpress.com/881/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/researchremix.wordpress.com/881/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/researchremix.wordpress.com/881/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/researchremix.wordpress.com/881/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/researchremix.wordpress.com/881/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/researchremix.wordpress.com/881/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/researchremix.wordpress.com/881/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/researchremix.wordpress.com/881/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/researchremix.wordpress.com/881/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/researchremix.wordpress.com/881/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/researchremix.wordpress.com/881/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/researchremix.wordpress.com/881/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/researchremix.wordpress.com/881/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=researchremix.wordpress.com&amp;blog=1015265&amp;post=881&amp;subd=researchremix&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://researchremix.wordpress.com/2011/12/15/text-for-reuse/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/ccf6fc7425a6f5e941fc043a2d069b86?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Heather</media:title>
		</media:content>
	</item>
		<item>
		<title>a future where data attribution Counts #idcc11</title>
		<link>http://researchremix.wordpress.com/2011/12/07/attribution-that-counts/</link>
		<comments>http://researchremix.wordpress.com/2011/12/07/attribution-that-counts/#comments</comments>
		<pubDate>Wed, 07 Dec 2011 09:40:42 +0000</pubDate>
		<dc:creator>Heather Piwowar</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://researchremix.wordpress.com/?p=713</guid>
		<description><![CDATA[Below is the rough text of my #idcc11 (International Data Curation Conference) talk.  Slides at slideshare, updated now to now include  (very tiny) text in speaker notes on each slide.  [Anyone know how to increase size of font of presenter notes and/or extract them in to text document from Keynote?] A future where data attribution Counts. [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=researchremix.wordpress.com&amp;blog=1015265&amp;post=713&amp;subd=researchremix&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Below is the rough text of my #<a href="http://www.dcc.ac.uk/events/idcc11">idcc11</a> (International Data Curation Conference) talk.  <a href="http://www.slideshare.net/hpiwowar/a-future-where-data-citation-counts">Slides at slideshare</a>, updated now to now include  (very tiny) text in speaker notes on each slide.  [Anyone know how to increase size of font of presenter notes and/or extract them in to text document from Keynote?]</p>
<p><strong>A future where data attribution Counts.</strong></p>
<p><a href="http://researchremix.wordpress.com/2010/08/05/sharing-data-makes-our-shoulders-broader/">Sharing data makes our shoulders broader</a>.</p>
<p>This is a great story, right? And why where are all here.</p>
<p>But it is also a great illustration of the problem</p>
<p>What exactly do broad shoulders get the individual researcher?</p>
<p>Pain!</p>
<p>Nobody looks at the supporting structure of an impressive tower.  We are all busy oggling the top.  That means these people? These ones with the shoulders? They&#8217;ve got nothing.</p>
<p>Ok, maybe they have some citations.  But do we think the promise of citation is enough?</p>
<p>Nope.</p>
<p>Don&#8217;t get me wrong, I&#8217;m a fan of studies that show a citation benefit for sharing data :) . But it won&#8217;t be enough.</p>
<p>If it were, we&#8217;d have researchers knocking down the doors of our IR for the 10 minute job of sending in their preprints. They aren&#8217;t doing that. Because a few citations, as much as we&#8217;d like to think otherwise, aren&#8217;t enough to offset the Fear Uncertainty and Doubt that accompanies the costs of uploading a dataset in the current culture.</p>
<p>So.</p>
<p>What to do about it? How to change the culture?</p>
<p>We need to facilitate deep recognition of the labour of dataset creation.</p>
<p>Ok let me say that again because it is so important</p>
<p>We need to facilitate deep recognition of the labour of dataset creation.</p>
<p>And while we are at it, we need to value the contributions of funders, the people who pay for all the gym equipment to help us build to the shoulders, and data repositories, who we might like to view as perhaps personal trainers.</p>
<p>Let&#8217;s dig in to how these groups do impact tracking now, and how they&#8217;d like to do it in the future.</p>
<p>Investigators, today, can list research products on CV</p>
<p>A CV is sort of bland, don&#8217;t you think? It has no context of use.</p>
<p>We can see one version of a more useful future comes from a tool called total-Impact.  Continuing a project that started as a hackathon at the Open Society Foundation workshop Beyond Impact organized by Cameron Neylon here in the UK last spring, Jason Priem, me, and a few other people have been working on a tool called total-impact.  <a href="http://total-impact.org">http://total-impact.org</a></p>
<p>It aggregates metrics for papers and also non-traditional research metrics, like datasets. The metrics are citations, but also altmetrics, or article-level citations&#8230;. various indications that others have found your research worth bookmarking, or blogging, or referencing on Wikipedia. It doesn&#8217;t currently look for dataset identifiers in public R packages, but it could, for example, as indication of use.</p>
<p>This makes a “live CV” if you will, giving post-publication context to research output.  (Could also be applicable as a CV for a department, or a grant, or a grant portfolio&#8230;.)</p>
<p>To do this really well we need to be able to list all metrics. Right now, many, are unavailable for this sort of mashup due to licending terms. This includes citations identified by Google Scholar, Thomson, and Scopus.</p>
<p>Repositories, today, can look at graphs of their deposit counts.</p>
<p>Many know their own download statistics, some share this with their authors or the public.</p>
<p>As a result of intenstive manual digging, some have metrics about how mamy times their datasets have been mentioned in the literature. I&#8217;ll splash by a few graphs of preliminary research findings&#8230;. come find me or my blog if you want more info. We are starting to be able to estimate third party reuse. Tools that support data citation will help this.</p>
<p>This is all a nice start.</p>
<p>What repositories really want, though, though &#8212; correct me if I&#8217;m wrong &#8212; is to show that they are indispensable. That they generate new, profound science not otherwise possible. That they are a great financial investment in scientific progress.  This requires knowing more than just a citation count, it requires knowing the context of reuse. This means we need access to the full text of the paper that cites the data.</p>
<p>What about funders?</p>
<p>They want to know the impact the data had on society.  Did it facilitate innovation, reduce discrimination, create jobs, save the rainforest, increase our GDP.</p>
<p>That kind of tracking is beyond what I know how to do :)</p>
<p>We&#8217;re going to need digital tracking technology that as far as I know isn&#8217;t available yet but I&#8217;m sure people are working on. Google analytics meets digital RF-ID tags&#8230;. I dunno&#8230; but I do know we need it.  Furthermore, we need these digital tracking mechanisms to be affordable and open, to facilitate mashups.</p>
<p>Ok, so with that sort of future vision for tracking, what do we need as a scholarly ecosystem need to power this future world?</p>
<p>We need innovation and experimentation.</p>
<p>We need 1000 flowers blooming.</p>
<p>We need solutions that are open and generative.</p>
<p>I don&#8217;t have all the answers, but here is part of it:</p>
<ul>
<li><strong>open access to citation data</strong>. We can&#8217;t just rely on Scopus, Thomson, and Google Scholar. Those are only three players, They good at what they do and have been invaluable, but they can&#8217;t possibly be as nimble as a whole bunch of startups. It is taking them a long time to come out with a data tracking tool. Why? Probably because they have an ambitious vision and need time to fit it into their other product offerings. Some of the rest of us would be happy with iterating on a quick and dirty solution. We need more competition in this space. The barrier to entry is extrodinarily high because of course reference lists are almost all behind copyright and paywalls&#8230;. but open access gives us a toehold.</li>
<li><strong>open access to full text</strong>. Open access also gives us a toehold into citation context information. A citation to a dataset tells us that the dataset played some role in that new research paper. What role? Was it used to validate a new method? Detect errors? Was it combined with other datasets to solve a problem that was otherwise intractable? The answers to these questions are fundamental to what funders and others need to know about impact. It won&#8217;t be easy to derive them from the text of the paper, but I strongly believe it is possible.  We need this to be true open access &#8212; can be used by anyone for any purpose &#8212; none of this Non Commercial nonsense&#8230; we must allow startups to use this information if we are going to get the innovation we need.</li>
<li><strong>open access to other metrics of use.</strong> We need broad-based metrics&#8230; not just citations, but blog posts about data, slides, tutorials that include R data, bookmarks to data on bookmarking sites. Altmetrics.  If you run a data repository, make your download stats publicly available. We frankly don&#8217;t know what all of this info means yet, but we didn&#8217;t know what citations to papers meant 50 years ago either. We&#8217;ll all figure it out, the more data the better.</li>
</ul>
<p>Here&#8217;s what each of us need to do:</p>
<ul>
<li><strong>raise our expectations.</strong></li>
<ul>
<li>what we could and should be able to mash up.  open citation lists, open access (true open access, none of this NC stuff that renders it useless for startups), open metrics, open data, the impact you can make</li>
</ul>
<li><strong>raise our voices.</strong></li>
<ul>
<li><a href="http://www.whitehouse.gov/blog/2011/11/07/request-information-public-access-digital-data-and-scientific-publications">http://www.whitehouse.gov/blog/2011/11/07/request-information-public-access-digital-data-and-scientific-publications</a></li>
</ul>
<li><strong>Get excited and make something</strong></li>
<ul>
<li><a href="http://www.flickr.com/photos/blackbeltjones/33656829">http://www.flickr.com/photos/blackbeltjones/33656829</a>94/</li>
</ul>
</ul>
<p>The future where data attribution Counts.</p>
<p><strong>The future is about what kind of impact a dataset makes,<br />
</strong><strong>not just a citation number.</strong></p>
<p>The future is open.</p>
<p>Open data.</p>
<p><strong>Open data about our data.</strong></p>
<p><strong><em><br />
</em></strong></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/researchremix.wordpress.com/713/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/researchremix.wordpress.com/713/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/researchremix.wordpress.com/713/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/researchremix.wordpress.com/713/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/researchremix.wordpress.com/713/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/researchremix.wordpress.com/713/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/researchremix.wordpress.com/713/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/researchremix.wordpress.com/713/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/researchremix.wordpress.com/713/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/researchremix.wordpress.com/713/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/researchremix.wordpress.com/713/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/researchremix.wordpress.com/713/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/researchremix.wordpress.com/713/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/researchremix.wordpress.com/713/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=researchremix.wordpress.com&amp;blog=1015265&amp;post=713&amp;subd=researchremix&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://researchremix.wordpress.com/2011/12/07/attribution-that-counts/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/ccf6fc7425a6f5e941fc043a2d069b86?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Heather</media:title>
		</media:content>
	</item>
		<item>
		<title>thoughts on where journals are now, what to do next</title>
		<link>http://researchremix.wordpress.com/2011/12/03/thoughts-on-where-journals-are-now-what-to-do-next/</link>
		<comments>http://researchremix.wordpress.com/2011/12/03/thoughts-on-where-journals-are-now-what-to-do-next/#comments</comments>
		<pubDate>Sat, 03 Dec 2011 22:59:08 +0000</pubDate>
		<dc:creator>Heather Piwowar</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://researchremix.wordpress.com/?p=708</guid>
		<description><![CDATA[With my Dryad hat on I was recently invited to participate in a &#8220;Future of Research Dissemination&#8221; day at BMJ.  Invitees were asked to give brief introductory remarks on what journals are doing now to enhance the experience of research and readers, what researchers and readers want, and then at the end what publishers ought [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=researchremix.wordpress.com&amp;blog=1015265&amp;post=708&amp;subd=researchremix&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>With my Dryad hat on I was recently invited to participate in a &#8220;Future of Research Dissemination&#8221; day at BMJ.  Invitees were asked to give brief introductory remarks on what journals are doing now to enhance the experience of research and readers, what researchers and readers want, and then at the end what publishers ought to be doing now to future-proof themselves.</p>
<p>My take, fwiw, with some links to previous blog posts with more detail:</p>
<p><span style="text-decoration:underline;">What are journals doing now wrt data?</span></p>
<p>Journals are increasingly recognizing that the datasets which support the findings in their articles are a crucial resource. They are working to make them more available and more useful.</p>
<p>There is increasing recognition that datasets are different than articles:</p>
<ul>
<li>how they are peer-reviewed (or not)</li>
<li>how they are licensed</li>
<li>how they are discovered</li>
<li>how they are preserved</li>
<li>how they are financed</li>
</ul>
<p>For all of these reasons, <a href="http://researchremix.wordpress.com/2010/08/13/supplementary-materials-is-a-stopgap-for-data-archiving/">supplementary information is probably not the right place for datasets.</a> Journals, along with others in the scholarly ecosystem (spurred on by <a href="http://researchremix.wordpress.com/2010/10/08/revised-nsf/">recent requirements by funders</a> for increased data availability, and evidence that researchers often don&#8217;t makedata available, <a href="http://researchremix.wordpress.com/2011/07/14/press-release-for-who-shares/">including for example in cancer</a>) are trying to figure out how best to move forward.</p>
<p>Data repositories are gaining traction <a href="http://researchremix.wordpress.com/2011/11/18/great-data-archive/">as a best practice solution</a>.</p>
<p>Some journals, like BMJ Open, integrate data submission with data repositories like Dryad<a href="http://blog.datadryad.org/2010/01/12/making-data-submission-almost-as-easy-as-falling-off-a-log/"> to make things as easy as possible for authors</a> and in some cases also peer-reviewers.</p>
<p>Journals are also reconsidering their own policies with respect to data, <a href="http://researchremix.wordpress.com/2008/03/20/a-review-of-journal-policies-for-sharing-research-data/">becoming more and more explicit</a> about what is expected from authors.</p>
<p>There is repeated evidence (<a href="http://researchremix.wordpress.com/2008/03/20/a-review-of-journal-policies-for-sharing-research-data/">1</a>, <a href="http://dataonedatacitations.wordpress.com/2010/09/13/dcc-poster-submission-data-citation-in-the-wild/">2</a>, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0018657">3</a>, others) of a strong correlation between data archiving policies and impact factor; high IF journals are more likely to expect data to be publicly available, and indeed <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0018657">have been measured</a> to have the highest rate of data availability.</p>
<p>Requiring data archiving in the current culture can feel daunting. One approach taken by the top-tier journals in evolutionary biology was to adopt a <a href="http://datadryad.org/jdap">coordinated Joint Policy for Data Archiving</a>. Starting in this last January all of those journals began requiring data archiving as a condition of publication simultaniously.</p>
<p>Finally, journals just in the beginning of ways to support synergistic discovery. Links between papers and data, <a href="http://api.plos.org/2011/05/31/announcing_the_plos_search_api/">full-text search</a> because of course <a href="http://precedings.nature.com/documents/4267/version/2">a paper is the best metadata</a> for a dataset, and <a href="http://researchremix.wordpress.com/2011/10/31/open-impact-tracking/">article open metrics of use</a> are getting off the ground. It is crucial these are available for both humans and machines (apis) to enable innovative and meaningful use. Also it is key they are open. Google Scholar etc isn&#8217;t, can&#8217;t spider, can&#8217;t reuse, can&#8217;t mashup.</p>
<p><span style="text-decoration:underline;">What do researchers and readers want?</span></p>
<p>Lots of things.  To pick one:  Recognition for the labour they&#8217;ve put in to creating data, and meaningful credit for anything built on top of it.</p>
<p>This is primarily a function for funders and institutions, but <strong>journals can play a unique role</strong> in making the appropriate credit explicit.</p>
<p>Citations to datasets is a start, but we must go further than that, because citations are too minor.  For example, we could ask authors <strong>what resources were essential to the research they are reporting</strong> and then revealing those debts in structured and open ways for remixing.</p>
<p><span style="text-decoration:underline;">What steps should journals take now (in the next year or two) to future-proof themselves?</span></p>
<p>(I was one of the last ones in the room to chip in.  OA (and in particular proper OA without NC), article level metrics, collaborating with other publishers, extending open peer review, experimenting in general, adopting stronger data policies, etc had already been mentioned.</p>
<ol>
<li>open computer programming interfaces to full text search, open impact metrics, and deep metadata to facilitate external innovation</li>
<li>software challenges for innovative applications, because those are the relationships you want to build</li>
<li>signal the way to best practices, by asking reviewers if &#8220;all resources have been made appropriately available&#8221; and by leaving space on submission form for &#8220;dataset IDs&#8221;   (hat tip to John Wilbanks).  When in doubt with these tactics and policies be brave not conservative.</li>
<li>experiment with new and more profound forms of acknowledgement for essential scholarly building blocks</li>
<li>start practicing living within lower profit margins.  (!).</li>
</ol>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/researchremix.wordpress.com/708/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/researchremix.wordpress.com/708/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/researchremix.wordpress.com/708/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/researchremix.wordpress.com/708/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/researchremix.wordpress.com/708/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/researchremix.wordpress.com/708/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/researchremix.wordpress.com/708/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/researchremix.wordpress.com/708/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/researchremix.wordpress.com/708/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/researchremix.wordpress.com/708/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/researchremix.wordpress.com/708/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/researchremix.wordpress.com/708/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/researchremix.wordpress.com/708/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/researchremix.wordpress.com/708/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=researchremix.wordpress.com&amp;blog=1015265&amp;post=708&amp;subd=researchremix&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://researchremix.wordpress.com/2011/12/03/thoughts-on-where-journals-are-now-what-to-do-next/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/ccf6fc7425a6f5e941fc043a2d069b86?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Heather</media:title>
		</media:content>
	</item>
		<item>
		<title>Doing data archiving well</title>
		<link>http://researchremix.wordpress.com/2011/11/18/great-data-archive/</link>
		<comments>http://researchremix.wordpress.com/2011/11/18/great-data-archive/#comments</comments>
		<pubDate>Fri, 18 Nov 2011 14:30:00 +0000</pubDate>
		<dc:creator>Heather Piwowar</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://researchremix.wordpress.com/?p=696</guid>
		<description><![CDATA[It is easy to think that archiving data is easy: just put the data files up on a website.  To do it well, though, isn’t that easy.  The Dryad digital repository has been thinking hard about these issues for years, working toward a practical, simple, and rewarding solution.  For Dryad’s website and promotional material we’ve articulated some [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=researchremix.wordpress.com&amp;blog=1015265&amp;post=696&amp;subd=researchremix&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>It is easy to think that archiving data is easy: just put the data files up on a website.  To do it well, though, isn’t that easy.  The <a href="http://datadryad.org/">Dryad</a> digital repository has been thinking hard about these issues for years, working toward a practical, simple, and rewarding solution.  For Dryad’s website and promotional material we’ve articulated some of the issues we feel are important; see <a href="http://datadryad.org/depositing#why">Why Should I Choose Dryad</a> for the up-to-date version.</p>
<p>I copy the current text here to inspire a conversation about “selling points” for a data archive, and even more importantly illustrate how involved it is to make a data archive great.</p>
<p><span id="more-696"></span></p>
<hr />
<p><em>From <a href="http://datadryad.org/depositing">http://datadryad.org/depositing</a></em></p>
<p>Dryad aims to make data archiving as <strong>simple</strong> and as <strong>rewarding</strong> as possible:</p>
<h3>Simple</h3>
<ul>
<li>Dryad welcomes <strong>data files associated with any published article in the biosciences,</strong> as well as <strong>software scripts and other files</strong> important to the article.</li>
<li>There is no restriction regarding <a href="http://datadryad.org/depositing#what">data formats</a>.</li>
<li>Dryad works with journals to <strong>integrate article and data submission</strong>, streamlining the deposit process. Once the files are prepared, deposition typically takes <strong>less than 15 minutes</strong> (2-minute video <a href="http://youtu.be/RP33cl8tL28">here</a>).</li>
<li>Data destined for <strong>more specialized repositories</strong> can, in some cases, be submitted through Dryad, reducing the time and complexity of data submission yet further.</li>
<li>Dryad provides a single clear and best-practice option for <a href="http://blog.datadryad.org/2011/10/05/why-does-dryad-use-cc0/">terms of reuse</a>.</li>
<li>A <strong>curator</strong> will check your files for technical problems before they are released.</li>
<li>By default, data are <strong>embargoed</strong> until journal article publication. Dryad makes sure this happens so you do not need to.</li>
<li>If it is supported by the policy of the journal, you may, during the submission process, select a<strong>‘no-questions-asked’ embargo</strong> on data downloads for one year post-publication. Dryad will support a longer embargo if directed by a journal editor.</li>
<li>You are free to provide additional <strong>keywords</strong> that make the data easier to discover and additional <strong>documentation</strong> (in the form of ReadMe files) to help ensure proper data reuse.</li>
<li>You have the ability to add <strong>new versions</strong> of data files in order to make updates or corrections.</li>
<li>Dryad can make data <strong>securely available for peer review</strong> at the request of the journal.</li>
</ul>
<h3>Rewarding</h3>
<ul>
<li>Dryad works to ensure that you get <strong>credit for reuse of your data</strong> by promoting adoption of best-practice data <a href="http://datadryad.org/using#howCite">citation policy</a> and the trackability of data citations.</li>
<li>Data files receive <strong>persistent, resolvable </strong><a href="http://datadryad.org/depositing#whatDOI">Digital Object Identifiers (DOIs)</a> that can be used in a citation as well as listed on your CV.</li>
<li>Dryad’s <a href="http://datadryad.org/depositing#whycc0">terms of reuse</a> for data <strong>facilitate the maximum impact</strong> for your work.</li>
<li>Data in Dryad are <strong>independently discoverable</strong>, providing a new route by which others may learn about your work.</li>
<li>Discovery is supported through the <strong>indexing</strong> of Dryad’s contents by services such as Google Scholar, Web of Science, and others.</li>
<li><strong>Usage statistics</strong> are available for you to highlight when your datasets are frequently downloaded.</li>
<li>Since Dryad does not reject data for being of the wrong type or in the wrong format, <strong>all the data files associated with an article can be archived together.</strong></li>
<li>Dryad can host files that are <strong>larger</strong> than those accepted by most journal websites (up to 1 GB per file and 10 GB per package).</li>
<li>Your data are <strong>preserved</strong> and made available for the long-term, even beyond the lifespan of Dryad, through continuous backup and replication services.</li>
<li>Dryad is <strong>community-led</strong>, with priorities and policies shaped by the members of the Dryad Consortium, including scientific societies, publishers, and other stakeholder organizations.</li>
<li>Dryad is a nonprofit, but takes <strong>sustainability</strong> seriously, ensuring that funds are available for long-term preservation.</li>
<li>Dryad is an active participant in organizations developing <strong>best-practices for data management</strong> such as <a href="http://www.biosharing.org/">Biosharing</a>, <a href="http://datacite.org/">DataCite</a> and <a href="http://www.dataone.org/">DataONE</a>. You as a researcher benefit from, and contribute to, the work of these organizations by depositing and using Dryad.</li>
</ul>
<p>Have we left out any characteristics that matter to you?  Or do you have a wishlist of things you’d like to see in a data archive like Dryad? Let us know in the comments, or <a href="mailto:help@datadryad.org" target="_blank">send us an email</a>.  Thanks!</p>
<hr />
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/researchremix.wordpress.com/696/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/researchremix.wordpress.com/696/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/researchremix.wordpress.com/696/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/researchremix.wordpress.com/696/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/researchremix.wordpress.com/696/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/researchremix.wordpress.com/696/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/researchremix.wordpress.com/696/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/researchremix.wordpress.com/696/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/researchremix.wordpress.com/696/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/researchremix.wordpress.com/696/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/researchremix.wordpress.com/696/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/researchremix.wordpress.com/696/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/researchremix.wordpress.com/696/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/researchremix.wordpress.com/696/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=researchremix.wordpress.com&amp;blog=1015265&amp;post=696&amp;subd=researchremix&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://researchremix.wordpress.com/2011/11/18/great-data-archive/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/ccf6fc7425a6f5e941fc043a2d069b86?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Heather</media:title>
		</media:content>
	</item>
		<item>
		<title>designing an awesome total-impact api</title>
		<link>http://researchremix.wordpress.com/2011/11/09/designing-an-awesome-total-impact-api/</link>
		<comments>http://researchremix.wordpress.com/2011/11/09/designing-an-awesome-total-impact-api/#comments</comments>
		<pubDate>Wed, 09 Nov 2011 20:54:58 +0000</pubDate>
		<dc:creator>Heather Piwowar</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://researchremix.wordpress.com/?p=684</guid>
		<description><![CDATA[APIs are awesome.  They let other people leverage your product and make unexpected things.  Total-impact exists because it could be built quickly on the APIs of others. So total-impact itself should have an awesome and easy to use api. We&#8217;ve made huge strides in this regard in the last few days.  We now have an [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=researchremix.wordpress.com&amp;blog=1015265&amp;post=684&amp;subd=researchremix&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>APIs are awesome.  They let other people leverage your product and make unexpected things.  Total-impact exists because it could be built quickly on the APIs of others.</p>
<p>So <a href="http://total-impact.org/">total-impact</a> itself should have an awesome and easy to use api.</p>
<p>We&#8217;ve made huge strides in this regard in the last few days.  We now have <a href="https://github.com/mhahnel/Total-Impact/wiki">an api roadmap</a> and have implemented the first part of it.  The total-impact web app will soon be doing all its data accesses through this api.  We really like saying we are built on our own api : )</p>
<p>Play around with the examples below and see what you think (please don&#8217;t use the api heavily or in production yet: it isn&#8217;t doing good caching, etc).  Suggestions for improvements in the design are very welcome!</p>
<div>Initial implementation includes:</p>
<ul>
<li>GET /items/ID1,ID2,ID3 or GET /items/ID1,ID2,ID3.html</li>
<ul>
<li>returns html for those IDs, as it would appear on the total-impact website.</li>
</ul>
<li>GET /items/ID1,ID2,ID3.json</li>
<ul>
<li>all metrics info in json format</li>
</ul>
<li>GET /items/ID1,ID2,ID3.xml</li>
<ul>
<li>all metrics info in xml format</li>
</ul>
<li>GET /items/ID1,ID2,ID3.json?fields=biblio,aliases,metrics,debug</li>
<ul>
<li>allows subsetting the metrics info returned</li>
</ul>
</ul>
<p>Examples:  (to try other IDs replace / in IDs with %252F)</p>
<ul>
<li>html: <a href="http://total-impact.org/api/v1/items/18428094,10.1371%252Fjournal.pmed.0020124,http:%252F%252Fopensciencesummit.com%252Fprogram%252F,10.5061%252Fdryad.8048">http://total-impact.org/api/v1/items/18428094,10.1371%252Fjournal.pmed.0020124,http:%252F%252Fopensciencesummit.com%252Fprogram%252F,10.5061%252Fdryad.8048</a></li>
<li>html, just metrics (good for easy embedding) <a href="http://total-impact.org/api/v1/items/18428094,10.1371%252Fjournal.pmed.0020124,http:%252F%252Fopensciencesummit.com%252Fprogram%252F,10.5061%252Fdryad.8048?fields=metrics">http://total-impact.org/api/v1/items/18428094,10.1371%252Fjournal.pmed.0020124,http:%252F%252Fopensciencesummit.com%252Fprogram%252F,10.5061%252Fdryad.8048?fields=metrics</a></li>
<li>json: <a href="http://total-impact.org/api/v1/items/18428094,10.1371%252Fjournal.pmed.0020124,http:%252F%252Fopensciencesummit.com%252Fprogram%252F,10.5061%252Fdryad.8048.json?fields=metrics">http://total-impact.org/api/v1/items/18428094,10.1371%252Fjournal.pmed.0020124,http:%252F%252Fopensciencesummit.com%252Fprogram%252F,10.5061%252Fdryad.8048.json?fields=metrics</a></li>
<li>just biblio (api supports returning just subsets of data elements) <a href="http://total-impact.org/api/v1/items/18428094,10.1371%252Fjournal.pmed.0020124,http:%252F%252Fopensciencesummit.com%252Fprogram%252F,10.5061%252Fdryad.8048.json?fields=biblio">http://total-impact.org/api/v1/items/18428094,10.1371%252Fjournal.pmed.0020124,http:%252F%252Fopensciencesummit.com%252Fprogram%252F,10.5061%252Fdryad.8048.json?fields=biblio</a></li>
</ul>
<div>
<div><a id="internal-source-marker_0.24202742101624608" href="https://docs.google.com/document/d/1My8fdD88a3_6fh9h6p3I2m9BDcMy0ddC5vcGQqbafFc/edit">Full roadmap in the works</a> (feedback encouraged!)</div>
</div>
</div>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/researchremix.wordpress.com/684/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/researchremix.wordpress.com/684/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/researchremix.wordpress.com/684/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/researchremix.wordpress.com/684/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/researchremix.wordpress.com/684/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/researchremix.wordpress.com/684/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/researchremix.wordpress.com/684/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/researchremix.wordpress.com/684/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/researchremix.wordpress.com/684/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/researchremix.wordpress.com/684/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/researchremix.wordpress.com/684/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/researchremix.wordpress.com/684/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/researchremix.wordpress.com/684/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/researchremix.wordpress.com/684/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=researchremix.wordpress.com&amp;blog=1015265&amp;post=684&amp;subd=researchremix&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://researchremix.wordpress.com/2011/11/09/designing-an-awesome-total-impact-api/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/ccf6fc7425a6f5e941fc043a2d069b86?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Heather</media:title>
		</media:content>
	</item>
	</channel>
</rss>
