Importance of Feature Selection in ML models An ever looming threat to astronomical applications of ML, and especially DL, is the danger of overfitting data. In particular, we refer to the problem of stellar parameterization from low-mid resolution spectra. The preferred method to deal with this issue is to develop and use spectral indices - this requires careful measurements of equivalent widths of blended spectral lines. This is prone to use error, and does not often result in very accurate results wrt the output parameters. In this work, we tackle this problem using an iterative ML algorithm to sequentially prune redundant features (wavelength points) to arrive at an optimal set of features with the strongest correlation with each of the output variables (stellar parameters) - T_eff, log(g) and [Fe/H]. We find that even at high resolution with tens of thousands of pixels (wavelength values), most of them are not only redundant, but actually decrease the mean absolute errors (MAEs) of the model output wrt the true values of the parameters. Our results are particularly significant in this era of exploding astronomical observational capabilities, when we will undoubtedly be faced with the 'curse of dimensionality'. We illustrate the importance of feature selection to reduce noise, improve model predictions, and best utilize limited computational and hardware resources on various downsampled and degraded synthetic PHOENIX spectra, by convolving the raw high res (500,000) sources to low and mid res (2,000 - 15,000).
Link to PDF (may not be available yet): O1-3.pdf