Skip directly to content
RxR's picture

The Evolving Data Lifecycle

on Sat, 10/19/2013 - 01:16

This essay is based on my presentation at eResearch Conference, Brisbane Australia 10/21/2013

The spotlight is on Data

Data within the research process has now taken center stage. The amount of data ranges the enormous quantities produced by large planned science missions to the smaller amounts produced by individual researchers, the so-called long tail of science. While the current focus in on data, it is important to look at data in context to the research process it self -- the data life cycle.

Looking at the Data Life Cycle

A scientific research process can be represented as a data lifecycle consisting of a series of stages through which data passes during its lifetime.  These stages include data processing, archiving, discovery, and finally use. Use by itself encompasses several sub-stages of access, integration, visualization, analysis, and sharing. These stages may have slight variations within different science domains and applications but in general remain consistent across many domains. The goal of informatics researchers is to make this process efficient for researchers, address existing gaps/hurdles, seamlessly integrate new evolving technology, and enable new types of research capabilities.

Factors Impacting Data Life Cycle

The data life cycle is dynamic, constantly evolving driven by several factors. The factors drive changes to the life cycle at both micro and macro level. At micro level, the changes are to the individual steps within the cycle where as at the macro level, the steps that constitute the cycle may get modified.  While these factors may overlap, they can be categorized based on four different perspectives. These are:

1. Data Perspective

RxR's picture

What is Analytics?

on Mon, 08/19/2013 - 18:47

Analytics is a term that is now often used interchangeably with Data Mining.

Everyone knows the Fayyad definition [1] on data mining, which is “the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data”.

Analytics, however, does not have a well-known definition.  Most attribute “analytics” to the Kohavi et. al [2], paper focusing on business analytics. In the footnote of this paper, the authors state that they use analytics and data mining interchangeably, yet there are some nuanced differences between data mining and analytics in the body of the work. While analytics is indeed applied data mining, there remain three important distinctions to consider between the two.

First, analytics focuses on effective customization of data mining via “verticalization”. Verticalization implies incorporating task-related domain knowledge into the analytic tools, removing the data analyst from the loop, and optimizing the performance of the tools in regard to execution speeds.

Secondly, analytics focuses on the usability of results. Results have to be presented in a manner such that business users can quickly gain insight from  intuitive visualizations of the results rather than sophisticated statistical plots.

Finally, analytics is itself an integral part of the data collection and decision support system rather than being an activity that is conducted outside of this system using separate sets of tools.

Given these distinctions, the question to ask within Earth Science is: do we really have analytics capability? Unfortunately, the answer is No.

[1]      U. M. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, “From Data Mining to Knowledge Discovery: An Overview,” Advances in Knowledge Discovery and Data Mining. The MIT Press, 1996.

RxR's picture

Science Informatics – What is in a name?

on Fri, 07/05/2013 - 21:10
What's in a name? that which we call a rose / By any other name would smell as sweet.” –Shakespeare, Romeo and Juliet
Informatics has become a commonly used term in a wide variety of science domains, yet what is really meant when we use this term? Does informatics mean the same thing for all domains, or are there nuanced differences in scope and meaning associated with the term? 
In order to investigate this, the definition and scope of different science informatics terms were reviewed and then compared along the dimensions of their defined objectives and the data life cycle components which they encompass. These different terms and their definitions are presented below in chronological order:
Bioinformatics:  Coined in 1970, the initial definition for this term was “the study of informatics processes in biotic systems” [6]. As evolutions in the field led to exponential increases in sequence data the definition evolved as well, eventually coming to mean the development and use of computational methods for data management and analysis of sequence data.
The current objective of Bioinformatics is to provide solutions for data management and analysis of bio-medical datasets. Bioinformatics focuses on the data life cycle components of data management and analysis.
rramachandran's picture

11th Conference on Artificial and Computational Intelligence and its Applications to the Environmental Sciences

on Mon, 07/02/2012 - 20:50

Abstracts due Aug 1.


 In addition to the joint sessions, the AI conference will hold general sessions for papers on the following topics:

rramachandran's picture

Data Mining and Semantic Web losing steam?

on Fri, 03/09/2012 - 23:15

If you look at Google trends data for Data Mining and Semantic Web term, you find interesting results.

 Data Mining vs Data Analytics

Based on the search volumes, it seems the interest in data mining is slowing eroding where as the interest in "data analytics" has picked up from 2007. However, the news references for data mining seem to be increasing. The news references are definitely trending upwards for data analytics.

Possible inference: Data mining is now well understood and commonly used that most people dont utilize the search engine for finding related resources. Or the relabeling to data analytics has now shifted the interest to a new term meaning the same thing. This is supported by the increase in the news references to data analytics.

Semantic Web vs Linked Data

The trends for these two terms are really interesting. The steady decline in the search volumes for the term semantic web indicates possible disillusionment perhaps. It is also interesting to note that the number of Linked Data searches have almost crossed over Semantic Web searches. Clearly, the momentum seems to have shifted to Linked Data (aka practical Semantic Web) even though the news references remain about the same.



RxR's picture

Instant Karma Provenance Talk @ AGU Today

on Fri, 12/09/2011 - 18:24

Slides from the Provenance talk given at the AGU session today.

RxR's picture

AGU Talk on Tools Market Place for Earth Science

on Thu, 12/08/2011 - 21:56

I am giving an AGU talk  titled "Earth Science needs a tool store". The talk explores the possibility of creating a tool store analgous to the app store. It explores conditions required to foster such a market place within Earth Science.

RxR's picture

Satellite Imagery showing the expansion of Delhi

on Wed, 12/07/2011 - 00:58

This is an interesting page showing satellite imagery over Delhi taken in 1977 and then in 1989. The urbanization effect can be seen in these images. It would be interesting to see recent images to track urbanization changes after the recent economic boom.