appendix b: BuildVPIndex options
option | meaning | default value |
value in dna/build.sh |
|
---|---|---|---|---|
-t | data type: one of "protein", "dna", "vector", "image", "ms", "msms" | n/a |
dna |
|
-d | location of "mobiosData" directory; if directory does not exist, will be created at specified location | current directory |
../ |
|
-i | input data file name | n/a |
data.fasta |
|
-o | output index name | n/a |
dna_18_100000 |
|
-psm | pivot selection method: random, fft, center, pcaonfft, pca | fft |
||
-p | number of pivots in an index node | 3 |
2 |
|
-dpm | data partition method: balanced, clusteringkmeans, clusteringboundary |
balanced |
||
-f | fanout of a pivot | 3 |
||
-m | maximum number of children in a leaf node | 100 |
||
-pl | path length | 0 |
||
-g | debug level | 0 |
||
-frag | fragment length, only meaningful for sequences* | n/a |
11 |
|
-dim | dimension of vector data to load, only meaningful for vectors | n/a |
n/a |
|
-b | bucketing | 1: will be used |
0 |
1 |
-s | size of index (number of data points) | all data points in source file |
100000 |
|
-r | maximum radius for partition | 0.1 |
n/a |
see mobios-v0.9-examples/dna/build.sh for an example of how to call BuildVPIndex.
advanced options
It is possible to build more than one index at a time with BuildVPIndex using the following options:
option | meaning |
---|---|
-sm | size of smallest index |
-la | size of largest index |
-st | step size of index |
When using the advanced options, the size of the index will be appended to the given index name for each index.
*When building an index over sequences, each sequence is broken up into sets of overlapping fragments, or k-mers. For more information on this methodology, see the paper: ...
"Using MoBIoS' Scalable Genome Joins to Find Conserved Primer Pair Candidates Between Two Genomes," Weijia Xu, Willard J Briggs, Joanna Padolina, Wenguo Liu, C. Randall Linder, Daniel P. Miranker. ISMB Bioinformatics, 2004.