LIGO Document G1400060-v2
- Abstract Large-scale physics experiments generate huge quantities of data. As an example, the US Advanced Laser Interferometer Gravitational-wave Observatory (LIGO) will produce 1 petabyte (=10^{15} bytes) of data per year when it comes online in 2015. The Large Hadron Collider currently generates approximately 25 times that amount now.
The storage, reduction, and analysis of these complex data sets have typically carried out by large teams of insiders ; expert scientists who belong to collaborations organized around the production of scientific results. In the past few years however, there has been a strong movement to broaden access to large-scale physics data in the US driven both by top down pressure (i.e., federal agency policies) and bottom up pressure (i.e., outsider researchers who want to access to data for their own research interests).
By and large the movement toward broader access is a positive one, but one that requires care in implementation. Providing open access to large complex data sets is neither easy nor inexpensive. It requires technical effort (in long term curation, data reduction and associated metadata production, date delivery in commonly used data formats, software to read and visualize the data, and associated documentation) as well as an understanding of the needs of the broader research community. There are cultural barriers to overcome ( It s my data, why should I have to give it to you? ) as well as implications for intellectual property rights of the data producers.
In this talk, I will survey current trends in open access and use LIGO as a case study to illustrate both the benefits and the challenges associated with providing large data sets to the broader research community.
-
- pptx (AAAS_reitze_slides.pdf, 1.7 MB)
DCC Version 3.5.2, contact
DCC Help