 |
Related Communities:
|
 |
|
 |
Eclipsing-binary Stars Classification applying Ensembled
Weka in Astrogrid
Overview:
The research is intended for incorporation into the
VO infrastructure of the facilities for astronomical problems solving by means of the data mining methods.
Existing approaches are analyzed. The preference is given to the use of ensembles of data mining
algorithms. An architecture (Ensembled Weka) is proposed for incorporation of the Weka system into the
VO infrastructure. The results of the architecture implementation are presented in the
paper. Advantages of use of the
VO facilities including Ensembled Weka for specific problem solving are shown.
Eclipsing (photometric) binaries are binary stars
of which one at times eclipses the other, thus leading
to alterations in the apparent total brightness of the
combined stars. The eclipse occurs because the line
of sight lies almost in the orbital plane of the stars.
Several catalogues of eclipsing binaries exist, e.g.,
General Catalogue of Variable Stars (GCVS); A
Finding List for Observers of Interacting Binary
Systems, 5th Edition; Eclipsing variables in
microlensing surveys. Data from these catalogues
were collected by Prof. Oleg Malkov in one catalogue (Malkov
2007) that currently contains information about 6675
binaries. In this collection a class is pre-determined for
1161 star.

The results obtained by a set of selected data mining algorithms
(an ensemble) are
processed by a generalizing function. E.g., in case of
classification and conventional voting for each
objects a number of algorithms that have assigned to
it a given class is determined and the class collected
maximal number of votes is chosen. The number of
votes is stored as a new attribute - the confidence
index.
New table containing a result corresponding to a
kind of a problem is produced by the ensemble. The
respective schema of the ensemble work is shown above.
Astrogrid
Applications Used:
- FormatConverter (ivo://ipi.ac.ru/formatConvert)
This application converts tables in different formats. Here it is used to convert data from native Weka format (ARFF), to VOTable
- Weka Classifier (ivo://ipi.ac.ru/dmWekaEnsembleClassifier)
This application classifies data in input table. Except table, this application receives configuration file, which specifies structure of classes, Weka algorithms to be included into an ensemble, and other
required parameters
An example of configuration file, used in eclipsing-binary stars classification problem can be found here
Results:
This file together with the catalogue of binaries
were stored in MySpace. As a result of the work of
the ensemble the 5514 binaries were classified,
providing the following class distribution
C - 852
CB - 89
CBF - 74
CBV - 149
CE - 15
CG - 1
CW - 84
CWA - 427
CWW - 331
S - 547
S2C - 3
SA - 1902
SC - 1
SH - 13
D - 553
DG - 41
DM - 422
DR - 10.
As a threshold for the confidence index 7 was
used. Binaries that were classified with the
confidence index less than threshold got an
incomplete classification.
|
|