Training a deep CNN to learn about galaxies in 15 minutes

Click for: original source

Let’s train a deep neural network from scratch! In this post, I provide a demonstration of how to optimize a model in order to predict galaxy metallicities using images, and I discuss some tricks for speeding up training and obtaining better results. By John F Wu.

In short, we want to train a convolutional neural network (CNN) to perform regression. The inputs are images of individual galaxies (although sometimes we’re photobombed by other galaxies). Galaxy metallicities can be obtained from the SDSS SkyServer using a SQL query and a bit of JOIN magic. All in all, we use 130,000 galaxies with metallicity measurements as our training + validation data set. In astronomy, metallicity is the abundance of elements present in an object that are heavier than hydrogen or helium.

The content of the article:

  • Predicting metallicities from pictures: obtaining the data
  • Organizing the data using the fastai2 DataBlock API
  • Neural network architecture and optimization
  • Evaluating our results

The author used fastai2 – a powerful high-level library that extends Pytorch and is easy to use/customize. At the moment, the documentation is still a bit lacking, but that’s okay – it’s still under active development! Very useful!

[Read More]

Tags big-data machine-learning data-science python