Dashboard > BinBase > ... > BinBase Features > customizeable

View Attachments (0) Info

customizeable

Customization of the BinBase

The basic idea is that all parts of the system should be replaceable. So we programmed the system against interfaces and adapter classes to make it as easy as possible to integrate your functionality.

Currently there are the following ways of customizing the system

  • modify the configuration file and add your own matching rules, is by far the easiest way
  • modify and add new Filters to the system, this is fairly simple
  • replace the algorithm, which is the most complex modification. For example you can modify the implementation for
    • importing data
    • annotating data
    • retention index correction

Modify the configuration File

Matching 

The general rules of matching are defined at the path "config.values.matching" and you need provide on section for each defined Pegasus version. You can find an example configuration file here

if you define the version attribute of the leco element with default, than BinBase will use this configuration if it can't find you specified Leco version. We highly recommend that you always have a default configuration. If you only work with one leco version, just name it default and you are set!

Example a minimum configuration to detect peaks with a purity < 1, a signal noise > 50 and similarity of 500

The attribute "maximalPurityBinGeneration" defines the maximum allowed purity a peak is allowed to have to be accepted as a new Bin and the minimalSignalNoiseBinGeneration is the minimum signal noise a peaks needs to have to be accepted as a new Bin.

<!-- defining a default configuration -->
<leco version="default" maximalPurityBinGeneration="1" minimalSignalNoiseBinGeneration="25">
       <!-- we have a minimal purity of not definied (nd) and a maximal purity of 1 -->
	<purity minimal="nd" maximal="1">
                <!-- to allow matching our signal noise should have a minimal value of 50 and a maximal value od not defined (nd) -->
		<signalnoise minimal="50" maximal="nd">
                        <!-- the similarity needs to be at least 500 and has a maximum of not defined (nd) -->
			<similarity minimal="500" maximal="nd"/>
		</signalnoise>
	</purity>
</leco>

This example above is more or less the easiest possible configuration for the main filter of the BinBase.  

If you now want to match only bins with a purity of < 0.5 and all bins with a purity > 1.5 you can define this like that.

<!-- defining a default configuration -->
<leco version="default" maximalPurityBinGeneration="1" minimalSignalNoiseBinGeneration="25">
       <!-- we have a minimal purity of not definied (nd) and a maximal purity of 1 -->
	<purity minimal="nd" maximal="0.5">
                <!-- to allow matching our signal noise should have a minimal value of 50 and a maximal value od not defined (nd) -->
		<signalnoise minimal="50" maximal="nd">
                        <!-- the similarity needs to be at least 500 and has a maximum of not defined (nd) -->
			<similarity minimal="500" maximal="nd"/>
		</signalnoise>
	</purity>
	<purity minimal="1.5" maximal="nd">
                <!-- to allow matching our signal noise should have a minimal value of 50 and a maximal value od not defined (nd) -->
		<signalnoise minimal="50" maximal="nd">
                        <!-- the similarity needs to be at least 500 and has a maximum of not defined (nd) -->
			<similarity minimal="500" maximal="nd"/>
		</signalnoise>
	</purity>

</leco>

 This shows quite nice that you can just add more elements to configure this filter. These are the basic rules for the configuration

  1. you need at least one leco element with your version number or with the version="default" attribute
  2. you need at least one or more purity elements
  3. every purity element needs at least one or more signal noise elements
  4. every signal noise element needs exactly 1 similarity element

With these simple rules it's fairly easy to define complex filters. If you want to disable the matching if the retention index correction failed, you need to set it like this

<correctionFailed>
    <!-- defines if it allowed to match samples with a failed ri-corretion -->
    <matchSample value="true"/>
</correctionFailed>

These needs to be at the path: "config.values"

Bin generation

The Bin generation factor needs to be in the range of 0 and 1. Since it makes no sense to define a setting of 0 we recommend that it should be between 0.5 and 1.0.

<generation>
	<factor>0.8</factor>
</generation>

0.5 means that a peak needs to be found in 50% of a class to be accepted as a new bin and 1.0 means that the peak needs to be found in every single sample to be accepted as a new Bin.

retention index correction

The retention index correction needs to be configured once in the file and than fine tuned in Bellerophon.

<correction>
	<allow>true</allow>
	<!-- derivation for retention index correction, musst be beetween 2 and 20, and n must be smaller/equal count of found standards! -->
	<polynome>5</polynome>
	<!-- minimal found standards! -->
	<minimal>8</minimal>
</correction>

 With this you basically define what kind of order you want to use and how many standards needs to be found to determine of the correction is successful or not.

Add and modify filter

You can extend the default filter or modify the configuration. All you need to do is to implement on interface or extend our default base class and define it in the configuration.

The later a filter is defined in the configuration file the alter it's executed. This has a huge impact on the performance, so it's recommended to put fast simple filters in the beginning and the more complex filters at the end of the configuration. 

Default Filters

<filters>
	<filter class="edu.ucdavis.genomics.metabolomics.binbase.algorythm.matching.anotation.RetentionIndexFilter"/>
	<filter class="edu.ucdavis.genomics.metabolomics.binbase.algorythm.matching.anotation.UniqueRatioFilter"/>
	<filter class="edu.ucdavis.genomics.metabolomics.binbase.algorythm.matching.anotation.UniqueIonFilter"/>
	<filter class="edu.ucdavis.genomics.metabolomics.binbase.algorythm.matching.anotation.SifterFilter"/>
	<filter class="edu.ucdavis.genomics.metabolomics.binbase.algorythm.matching.anotation.IonFilter"/>
</filters>

If you need examples how to implement your filter please have a look at our svn repository, since this will help you the most. We also recommend that you don't implement the Filter interface and instead extend the BasicFilterclass.

Replace parts of the Algorythm

If you are not happy with our algorithm, need specific things changed or want to use the software with your own algorithm you can do this very easily. All you need to do is to implement a couple of interfaces and define them in the configuration file.

 <class>
	<matching value="edu.ucdavis.genomics.metabolomics.binbase.algorythm.matching.SimpleMatching"/>
	<correction value="edu.ucdavis.genomics.metabolomics.binbase.algorythm.matching.RecrusiveCorrection"/>
	<algorythm value="edu.ucdavis.genomics.metabolomics.binbase.algorythm.matching.StandardAlgorithmHandler"/>

	<import>
		<provider class="edu.ucdavis.genomics.metabolomics.binbase.algorythm.Import.data.provider.PegasusASCIIIProvider" source="edu.ucdavis.genomics.metabolomics.binbase.algorythm.Import.data.source.FileSource"/>
	</import>
</class>

import file format

To change the way files are imported you have two possibilities.  First you can define where your data are. You don't need to worry about this since the default approach is that the application server stores your data in a database table and this is accessed from the application. The advantage is that you can get to your files from all cluster nodes without mounting any directories on the nodes.

The second thing you can change is the implementation of the data format. For example if you want to support the import of xml files you need to implement the interface SampleDataProvider and register it in the configuration. You need to make sure that your data provides all the required fields.

Required Fields
 <version id="221">
	<entry binbase="UniqueMass" pegasus="UniqueMass"/>
	<entry binbase="S/N" pegasus="S/N"/>
	<entry binbase="Purity" pegasus="Purity"/>
	<entry binbase="R.T. (seconds)" pegasus="R.T. (seconds)"/>
	<entry binbase="Quant S/N" pegasus="S/N"/>
	<entry binbase="Spectra" pegasus="Spectra"/>
	<entry binbase="Quant Masses" pegasus="Quant Masses"/>
</version>

The "binbase" attribute on the left are the required fields and the "pegasus" attributes need to be mapped to it. For example in our case we map the pegasus S/N field to the binbase fields S/N and Quant S/N.

Annotation of mass specs

You can change the annotation of how the algorithm works by implementing you own AlgorithmHandler or by implementing another way of matching mass specs. These are both very complex things and so we sadly can't explain this in any simple way. We do are there to help you if you have question.

Involved classes/interfaces
  • BasicAlgorythmHandler, used to compare two mass specs and to tell us if they match or not, this class implements the interface AlgorythmHandlerand provides some helper methods. The default implementation is this class
  • AbstractMatching, used to do the actual annotation of the peaks and uses the AlgorythmHandler to determine if mass specs match. The default implementation is this class
  • AbstractMethod, prepares the matching methods, handles and loads the implementations, should'nt be needed to be implemented since our default implementations takes care of these things. You can find the default implementation here.

retention index correction 



Browse Space
- Pages
- Labels
- Attachments
- Mail
- Bookmarks
- News
- Activity
- Advanced

Explore Confluence
- Popular Labels
- Notation Guide

Your Account
Log In

 

Other Features

View a printable version of the current page.

Add Content


Powered by Atlassian Confluence 2.7.1, the Enterprise Wiki.
Bug/feature request - Contact administrators