Transfer Learning and Deep Metric Learning for Automated Galaxy Morphology Representation


Galaxy morphology characterisation is an important area of study, as the type and formation of galaxies offer insights into the origin and evolution of the universe. Owing to the increased availability of
images of galaxies, scientists have turned to crowd-sourcing to automate the process of instance labelling. However, research has shown that using crowd-sourced labels for galaxy classification comes with many pitfalls. An alternative approach to galaxy classification is metric learning. Metric learning allows for improved representations for classification, anomaly detection, information retrieval, clustering and dimensionality reduction. Understanding the implications of this approach regarding crowd-sourced labels is of paramount importance if scientists intend to continue using them. This paper compares metric learning and classification models trained or fine-tuned on both the crowd-sourced Galaxy Zoo 2 (GZ2-H) dataset and expertly labelled EFIGI catalogue. The study uses the Revised Shapley-Ames (RSA) catalogue of bright galaxies, also labelled by experts, as an unseen test set. The RSA catalogue allows for an accurate comparison of the performance of the models at predicting the Hubble types of galaxies. The classification accuracy for the crowd-sourced and expert models indicated that the models are comparable on the surface. However, using alternative metrics, the results show that the models trained on the expert dataset outperformed the model trained on the crowd-sourced data in terms of actual vs predicted labels. Further, the results show that fine-tuning a model pre-trained on crowd-sourced data can outperform the state-of-the-art in galaxy characterisation.

The models trained to predict the Hubble types of galaxies are better when fine-tuned using the ProxyNCA and Normalised-Softmax loss functions than with other pairwise losses. The Normalised-Softmax loss
yielded the best overall 9-class models with accuracies at 30.88% (GZ2-H) and 30.05% (EFIGI) and MAP values of 0.3483 (GZ2-H) and 0.3889. The Proxy-NCA loss produced the second-best overall 9-class models
with accuracies at 30.33% (GZ2-H) and 20.03% (EFIGI) and MAP values of 0.3577 (GZ2-H) and 0.3917 (EFIGI). Finally, the paper highlights the need for caution when utilising crowd-sourced labels; however,
it argues that transfer learning from crowd-sourced labelled data to expert-labelled data can still lead to
significant improvements.

Full text available