Mapping Musical Mood with Unsupervised Learning: PCA Spaces and Cosine-Similarity Recommendations

Kaleb Mercado, Claire Chang

September 26, 2025

This preprint reports new research that has not been peer-reviewed and revised at the time of posting

Categories: Engineering

Abstract

We ask whether lightweight, explainable methods can model musical mood and support mood-aware retrieval without raw-audio pipelines or listener ratings. Using the Coimbra MIR 4Q dataset annotated under Russell’s circumplex (900 clips), we merged musically interpretable features that summarize tempo, timbre, rhythm, and dynamics, alongside compact tag encodings. After standardizing feature blocks, we applied principal component analysis (PCA) to obtain a low-dimensional embedding; loadings suggested that PCA1 tracked dynamics and meter steadiness, and PCA2 tracked rhythmic variability. Without using labels to fit PCA, the two-dimensional map aligned with the four circumplex quadrants. Model selection used scree and reconstruction-error curves, which indicated diminishing returns after about 6–8 components. Treating quadrants as clusters for evaluation yielded strong separation (silhouette 0.609, Davies–Bouldin 0.483, Calinski–Harabasz 3661.9). A cosine-similarity recommender retrieved nearest neighbors that were musically and emotionally coherent, with an option to emphasize items near quadrant boundaries to surface blended emotions. Because the approach remains in a tabular feature space, it is transparent, fast, and easy to tune through feature weights and tag contributions. The results demonstrate a practical path to mood-aware recommendations using explainable techniques and publicly available features.

Additional files

Figure1.png 32.5 KB

Download
Figure2.png 20.5 KB

Download
Figure3.png 43.6 KB

Download

Download PDF

Scientific Feedback

Lauren Girouard-Hallam | eiRxiv Reviewer | University of Michigan

October 19, 2025

Excellent work on this impressive piece of computational modeling! I read your article with great interest and really appreciated how thorough you were about your machine learning pipeline and your findings. I have some presentation suggestions and a very brief scientific question, outlined below:

Scientific Recommendations:

Please provide a brief sentence about how you set your seed for your similarity recommendations (i.e., why "Dream" by Fleetwood Mac)

Presentation Recommendations:

Introduction: Your introduction currently reads as a summary for the whole paper. I recommend the following flow for your introduction:(1) a paragraph introducing you topic, (2) a paragraph on what we know about music attributes and human affect (I recommend including some relevant neuroscience/sensation & perception psychology literature for this), and (3) a paragraph about the use of machine learning for these kinds of matching tasks. Then, you can have a current study section that talks about what you did and briefly justifies it and lays out your hypotheses.
Results: The first two paragraphs of your results sections should actually go into your methods. The only things we want to see in results is your findings!
Discussion: The first sentence of your discussion should actually go into your results! In general your discussion currently reads as a summary of your results. While it is okay to resummarize your findings a bit in your discussion, most of your discussion should focus on two things. First, it should highlight what your work adds to the existing literature on your topic (you can usually reference the studies you cited in your introduction to do this). Second, it should indicate what your results might mean, both in terms of the theory (ie about modeling more broadly or how we make music-mood match choices specifically) and practice (eg what this might mean for real world algorithms in Spotify, as an example). You also want to include BOTH limitations to your current study and questions for future research, both of which are currently missing here.
Figures: Please provide descriptive captions for your figures that summarize what is happening in them, and make sure not to use abbreviations in your labels to keep everything clear.

A scientist with subject-specific expertise provided this feedback. Constructive feedback plays a key role in the scientific process because it allows researchers to learn from other scientists, be encouraged, and refine their ideas, research, and presentation.