View • Attachments (0) • Info
cluster benefits
The BinBase system was developed to produce data as fast as possible and always faster as we can measure data. The last part is important to avoid a buildup of data.
Currently the import benefits from a cluster since we can calculate N classes at the same time. N stands for the Number of nodes. Which means if you have an experiment with 30 classes and your cluster has 10 Nodes enabled for BinBase, that you can calculate 10 classes at once.
Limitation:
Once one class encounters that it needs to generate bins, it will lock the database, generate the bins and releases the lock afterwords. This is done to avoid the double generation of Bins.
Our current approach is to use one node for one experiment export, but there is no further parallel processing in the actual export. We are looking into this to speedup single large exports, since they can take several hours. On approach would be to parallelism the post matching which would be pretty simple.
Right now you can export N experiments at a time, where N is the number of nodes. There is no locking involved, so it scales very well.
during the weekly updates we match all the samples again and update all the meta data. This scales extremely well and we can run N updates at once, where N is the number of nodes.
it is possible to execute very complicated and long running queries, specially when you want to know in which species a certain bin is found or statistics about certain bins.
With the latest BinBase version (currently svn only, we hope to release it by begin of august officially) It is possible to run BinBase on a common pc without a cluster. You will loose quite some features, but it has the benefit that you can start using the software without an extensive setup. You can always install a BinBase cluster version later.
|
Browse Space |
Explore Confluence |
Your Account |
Add Content |
|
Powered by Atlassian Confluence 2.7.1, the Enterprise Wiki. |