Glossary of common terms

or ... 

The Developer’s Guide to the Galaxy-ES (Volume 1)

FACE-IT has lots of powerful concepts (many inherited from Galaxy and Globus plus a few extras) that you'll want to get familiar with to make the most of the system. Here's the ones we can think of at the moment, but we'll try to keep this updated and current:

Galaxy – A very nifty workflow engine with a web GUI developed originally for data intensive biology. Galaxy has a really huge and really great user and developer community including applications in sectors well outside of bio.  
http://galaxyproject.org 

Workflow – A directed acyclic graph with nodes representing tools/actions and pipes representing data exchange connections. Workflows can encode data processing pipelines with multiple steps in shareable and reproducible ways, or even entire scientific pipelines (very cool!). 

Data Types  Galaxy is strictly data typed. Each node of a workflow requires data with a specific format and structure. These are defined by Data Types (which are really similar to “classes” in Object Oriented Programming). A Data Type represents a class of data that is recognizable in FACE-IT in a unique and unambiguous way. The process of recognition is called "sniffing", which leads us to...

Sniffing  For each Data Type we must define the logic that is used to check if a data file fits the Data Type specifications with any doubt. This is called the “sniffing logic” and is a key point in Galaxy development. Be aware that file-name extensions don't get sniffed! Disambiguation: in the Data Type definition xml file Galaxy uses a parameter called “extension” to identify each Data Type. It is just an id, and is not actually referencing the filename extension, so don't be confused!

Dataset  Datasets are simply any data object in Galaxy. A dataset must be recognized as being of one, and only one, Data Type. Any sort of ambiguity in the sniffing process is strictly forbidden and must be avoided. You can upload files for some things that haven't been typed yet, but its really not good practice. Regular datasets are stored in files on a file system shared by the server running the web GUI.  In most cases a dataset is coincident with a single data file, but Galaxy also handles composite datasets (i.e. multiple files) by simply putting them all in a referenced directory.

History  A history is a named collection of datasets stored on the file system shared by the web GUI server. Each dataset is stored in the History in the order of creation. The History GUI is the panel on the right of the Galaxy interface. Each dataset is represented by its name and other display information. The History GUI provides functionalities for both History level and Dataset level interaction and visualization. 

Tools  In the Galaxy terminology, tools represent what is done with data in each node of the workflow. From the technical point of view it is a wrap over an executable. The executable (pretty much any kind is supported) must be self-contained with a command line interface accepting literal parameters or file paths. Each file passed to the executable must be recognized by Galaxy as a well known and defined Data Type. Most tools can be classified by their purpose in a few different categories: 

  • data source tools get data from somewhere and import it into the History
  • processing tools act on a dataset in really simple (e.g. translating the format or doing a unit conversion) to really complex (e.g. running a complex biophysical process model with multiple inputs and outputs) ways
  • converters perform a conversion from one Data Type to another in a strictly 1-to-1 manner (i.e. there are no parameters to choose from or any other input needed)
  • visualization tools provide some rendered outputs that might be viewed in the Galaxy canvas or exported for download
  • data sink tools implement data export for download or storage somewhere other than the History (e.g. export to a remote database somewhere)

Working modes  Galaxy has two working modes: 

  • Data Analysis is when the user interactively plays with Datasets and Tools 
  • Workflow mode is where the operations with data are organized in the form of acyclic oriented graphs or pipelines.
Users can automatically convert any set of operations/Datasets in their History into a consistent and reproducible Workflow that can then be shared and published. 

Tool Backend  The backend is the executable that does the dirty work with the data. It could literally be any kind of executable stored as a binary, a script, a Python program, etc. (the configuration is really versatile).

Tool Parameters  Parameters provide a way to pass arguments in Tool invocations. As previously stated, Galaxy is strictly Data Type oriented so each command line argument passed to a Tool Backend must be consistent with the Galaxy data model. Tool Parameters have an HTML visualization form that is rendered by Galaxy whether you're using the Data Analysis mode or setting up a node in the Workflow mode. The most commonly used Tool Parameter is the one called “data”, which just takes a file name for a dataset on the local file system. (Remember it should be sniffable so Galaxy can figure out the Data Type!)

Converters  Converters are a special set of Tools with no user interface. Converters are only used to perform conversions from one Data Type to another, without any need for parameters or other inputs. Converters are not directly involved in creating workflows. Instead they are invoked by the workflow engine anytime it recognizes that a known conversion is needed to interface a Dataset produced as output by one Tool to the input of another Tool.
If Galaxy has the appropriate Data Type converters defined, it generates a “conversion path”. Let’s say a Tool called Bar produces a Dataset of Data Type A (tA) that are consumed by a Tool called Foo. Foo accepts a Dataset of Data Type D (tD) as input, so it would seem these tools can't be directly connected in a workflow. But if there is a Converter defined in the Data Type definitions from tA to tB and another from tB to tD, the workflow engine will automatically produce a reasonable conversion path from tA to tD, so the workflow fits the strict requirement of Data Type consistency.   

Tool Palette  Each available Tool has a name and short description in the left panel of the Galaxy GUI, which is called the Tool Palette. The Tool Palette is divided into named sections that are collapsible. The Tool Palette is totally customizable for different application.

Canvas  The Canvas is the main interface display area. Technically the Canvas is implemented using the iframe HTML, and is divided into three main columns: 

  • the left panel, usually showing the Tool Palette
  • the middle panel, which handles most of the main content in the user interface area 
  • the right panel, which is usually used by the History GUI (accept in Workflow mode where it shows tool information and parameters). 
These configurations can be customized programmatically if needed (e.g. to embed a web-based mapping or visualization engine or something fancy like that).

Display Applications  Display Applications are web applications running on a remote server, but providing data visualization in the Canvas. Display Applications are used when the results of a Tool can be used as parameters to visualize with a remote database (e.g. if a Tool produces the 4 corners of a bounding box which are passed to a remote mapping server that returns a high-resolution rendered map). Display Applications are best used for simpler transfers (i.e. you don't want to transfer a huge amount of data to a remote server or run a very intensive rendering algorithm); in these cases its probably better to use a Visualization Tool approach. 

Visualizers  If a dataset can be used as a source of live data selection and extraction for a convenient client side visualization, the approach based of Visualizers is the way to go. A Visualizer is a software components that implements a client side visualization with a Dataset produced from a Tool or Workflow. Visualizers can create amazing interactive data output, but can be used only if the amount of data to be transferred from the Dataset to the client side is compatible with the HTTP based user experience or technical requirements/constraints.

Toolsheds  Toolsheds are maybe the most important feature. Data Type definitions, Tools, Tool Parameters, Display Applications, Visualizers and all you need to implement custom features can all be packed into a Toolshed. Imagine a Toolshed as a sort of mobile device app: self contained and easily installable and configurable in your own Galaxy installation. 

Galaxy-ES   Galaxy-ES is the main software component in the FACE-IT project. Galaxy-ES is a Galaxy for the Earth System (Galaxy-ES or GES), or in the usual Galaxy parlance it is “Data intensive earth science for everybody”. Technically it is implemented as a Toolshed of a custom modified Galaxy that is closely related (and hopefully will soon be the same) to the core Galaxy.

Comments