Tool condition monitoring (TCM) has become a research area of interest due to its potential to significantly reduce manufacturing costs while increasing process visibility and efficiency. Machine learning (ML) is one analysis technique which has demonstrated advantages for TCM applications. However, the commonly studied individual ML models lack generalizability to new machining and environmental conditions, as well as robustness to the unbalanced datasets which are common in TCM. Ensemble ML models have demonstrated superior performance in other fields, but have only begun to be evaluated for TCM. As a result, it is not well understood how their TCM performance compares to that of individual models, or how homogeneous and heterogeneous ensemble models’ performances compare to one another. To fill in these research gaps, milling experiments were conducted using various cutting conditions, and the model groups were compared across several performance metrics. Statistical t-tests were also used to evaluate the significance of model performance differences. Through the analysis of four individual ML models and five ensemble models, all based on the processes’ sound, spindle power, and axial load signals, it was found that on average, the ensemble models performed better than the individual models, and that the homogeneous ensembles outperformed the heterogeneous ensembles.