View • Attachments (0) • Info
customizeable
The basic idea is that all parts of the system should be replaceable. So we programmed the system against interfaces and adapter classes to make it as easy as possible to integrate your functionality.
Currently there are the following ways of customizing the system
The general rules of matching are defined at the path "config.values.matching" and you need provide on section for each defined Pegasus version. You can find an example configuration file here
if you define the version attribute of the leco element with default, than BinBase will use this configuration if it can't find you specified Leco version. We highly recommend that you always have a default configuration. If you only work with one leco version, just name it default and you are set!
Example a minimum configuration to detect peaks with a purity < 1, a signal noise > 50 and similarity of 500
The attribute "maximalPurityBinGeneration" defines the maximum allowed purity a peak is allowed to have to be accepted as a new Bin and the minimalSignalNoiseBinGeneration is the minimum signal noise a peaks needs to have to be accepted as a new Bin.
<!-- defining a default configuration --> <leco version="default" maximalPurityBinGeneration="1" minimalSignalNoiseBinGeneration="25"> <!-- we have a minimal purity of not definied (nd) and a maximal purity of 1 --> <purity minimal="nd" maximal="1"> <!-- to allow matching our signal noise should have a minimal value of 50 and a maximal value od not defined (nd) --> <signalnoise minimal="50" maximal="nd"> <!-- the similarity needs to be at least 500 and has a maximum of not defined (nd) --> <similarity minimal="500" maximal="nd"/> </signalnoise> </purity> </leco>
This example above is more or less the easiest possible configuration for the main filter of the BinBase.
If you now want to match only bins with a purity of < 0.5 and all bins with a purity > 1.5 you can define this like that.
<!-- defining a default configuration --> <leco version="default" maximalPurityBinGeneration="1" minimalSignalNoiseBinGeneration="25"> <!-- we have a minimal purity of not definied (nd) and a maximal purity of 1 --> <purity minimal="nd" maximal="0.5"> <!-- to allow matching our signal noise should have a minimal value of 50 and a maximal value od not defined (nd) --> <signalnoise minimal="50" maximal="nd"> <!-- the similarity needs to be at least 500 and has a maximum of not defined (nd) --> <similarity minimal="500" maximal="nd"/> </signalnoise> </purity> <purity minimal="1.5" maximal="nd"> <!-- to allow matching our signal noise should have a minimal value of 50 and a maximal value od not defined (nd) --> <signalnoise minimal="50" maximal="nd"> <!-- the similarity needs to be at least 500 and has a maximum of not defined (nd) --> <similarity minimal="500" maximal="nd"/> </signalnoise> </purity> </leco>
This shows quite nice that you can just add more elements to configure this filter. These are the basic rules for the configuration
With these simple rules it's fairly easy to define complex filters. If you want to disable the matching if the retention index correction failed, you need to set it like this
<correctionFailed>
<!-- defines if it allowed to match samples with a failed ri-corretion -->
<matchSample value="true"/>
</correctionFailed>
These needs to be at the path: "config.values"
The Bin generation factor needs to be in the range of 0 and 1. Since it makes no sense to define a setting of 0 we recommend that it should be between 0.5 and 1.0.
<generation> <factor>0.8</factor> </generation>
0.5 means that a peak needs to be found in 50% of a class to be accepted as a new bin and 1.0 means that the peak needs to be found in every single sample to be accepted as a new Bin.
The retention index correction needs to be configured once in the file and than fine tuned in Bellerophon.
<correction> <allow>true</allow> <!-- derivation for retention index correction, musst be beetween 2 and 20, and n must be smaller/equal count of found standards! --> <polynome>5</polynome> <!-- minimal found standards! --> <minimal>8</minimal> </correction>
With this you basically define what kind of order you want to use and how many standards needs to be found to determine of the correction is successful or not.
You can extend the default filter or modify the configuration. All you need to do is to implement on interface or extend our default base class and define it in the configuration.
The later a filter is defined in the configuration file the alter it's executed. This has a huge impact on the performance, so it's recommended to put fast simple filters in the beginning and the more complex filters at the end of the configuration.
<filters> <filter class="edu.ucdavis.genomics.metabolomics.binbase.algorythm.matching.anotation.RetentionIndexFilter"/> <filter class="edu.ucdavis.genomics.metabolomics.binbase.algorythm.matching.anotation.UniqueRatioFilter"/> <filter class="edu.ucdavis.genomics.metabolomics.binbase.algorythm.matching.anotation.UniqueIonFilter"/> <filter class="edu.ucdavis.genomics.metabolomics.binbase.algorythm.matching.anotation.SifterFilter"/> <filter class="edu.ucdavis.genomics.metabolomics.binbase.algorythm.matching.anotation.IonFilter"/> </filters>
If you need examples how to implement your filter please have a look at our svn repository, since this will help you the most. We also recommend that you don't implement the Filter interface and instead extend the BasicFilterclass.
If you are not happy with our algorithm, need specific things changed or want to use the software with your own algorithm you can do this very easily. All you need to do is to implement a couple of interfaces and define them in the configuration file.
<class> <matching value="edu.ucdavis.genomics.metabolomics.binbase.algorythm.matching.SimpleMatching"/> <correction value="edu.ucdavis.genomics.metabolomics.binbase.algorythm.matching.RecrusiveCorrection"/> <algorythm value="edu.ucdavis.genomics.metabolomics.binbase.algorythm.matching.StandardAlgorithmHandler"/> <import> <provider class="edu.ucdavis.genomics.metabolomics.binbase.algorythm.Import.data.provider.PegasusASCIIIProvider" source="edu.ucdavis.genomics.metabolomics.binbase.algorythm.Import.data.source.FileSource"/> </import> </class>
To change the way files are imported you have two possibilities. First you can define where your data are. You don't need to worry about this since the default approach is that the application server stores your data in a database table and this is accessed from the application. The advantage is that you can get to your files from all cluster nodes without mounting any directories on the nodes.
The second thing you can change is the implementation of the data format. For example if you want to support the import of xml files you need to implement the interface SampleDataProvider and register it in the configuration. You need to make sure that your data provides all the required fields.
<version id="221"> <entry binbase="UniqueMass" pegasus="UniqueMass"/> <entry binbase="S/N" pegasus="S/N"/> <entry binbase="Purity" pegasus="Purity"/> <entry binbase="R.T. (seconds)" pegasus="R.T. (seconds)"/> <entry binbase="Quant S/N" pegasus="S/N"/> <entry binbase="Spectra" pegasus="Spectra"/> <entry binbase="Quant Masses" pegasus="Quant Masses"/> </version>
The "binbase" attribute on the left are the required fields and the "pegasus" attributes need to be mapped to it. For example in our case we map the pegasus S/N field to the binbase fields S/N and Quant S/N.
You can change the annotation of how the algorithm works by implementing you own AlgorithmHandler or by implementing another way of matching mass specs. These are both very complex things and so we sadly can't explain this in any simple way. We do are there to help you if you have question.
|
Browse Space |
Explore Confluence |
Your Account |
Add Content |
|
Powered by Atlassian Confluence 2.7.1, the Enterprise Wiki. |