Some of the Science Behind the Being Good Technology…
Being Good listened to tens of thousands of audio tracks to isolate and identify audio frequency characteristics that correspond to moods in the tracks.
Being Good then trained an AI system to recognize filtered intensity differences in representations of those frequency characteristics.
The Being Good human trained AI system rapidly identifies the moods in any music.
Being Good Applications:
Each audio track is converted to spectrograms representing different audio features extracted from each second of audio.
The spectrograms are input to an AI neural network trained on an audio dataset that includes a large set of audio classes and a vast collection of sound clips of music, speech, different musical instruments, animal sounds, and common environmental sounds.
The neural net uses the trained data to extract a large number of low-level audio features to generate audio classifications, including moods.
Some of the low-level audio features used to calculate rhythm (R) includes:
Beats per minute histogram and highest peak, spread of the first peak, first peak weight, and first peak beats per minute.
Energy in a frequency band.
Energy in one or more ERB bands of the spectrum and values of crest and flatness.
Weighted mean of frequencies as a measure of the spectral centroid.
Skewness of the 0th,1st, 2nd, 3rd, and 4th central moments.
Different combinations of low-level audio features are used to calculate texture (T) and pitch (P).
Once the low-level audio features have been identified for R, T and P, that data is input into a second neural network human trained with known RTP scores corresponding to tens of thousands of audio tracks in order to determine the RTP score for each analyzed track.
Being Good Spotify Demo:
Spotify provides a development platform that includes an application programming interface (API) that allows Being Good to obtain audio analysis data for each audio track that is available through Spotify.
The data provided by Spotify is significantly less detailed and sophisticated than the data utilized by the Being Good applications.
As a result, Spotify-based RTP scores may be less accurate.
The audio analysis data is a set of high-level acoustic audio features derived from low-level data extracted from each track. Spotify does not provide the RTP data or the low-level data.
The high-level data includes what Spotify calls acousticness, danceability, energy, instrumentalness, key, liveness, loudness, mode, speechiness, tempo, time signature, and valence.
Audio values for R are based on time signature, danceability, energy, tempo, and loudness. Audio values for T and P are based on combinations of other high-level data.
Those audio values are input to a neural network human trained with known RTP scores for tens of thousands of tracks to determine the RTP score for each analyzed track.
RTP to Mood Mapping:
Regardless of how RTP scores are determined for a track, the mapping of those RTP scores to moods is the same.
RTP scores range from 1 to 5 on a half point scale. There are nine possible scores for each of R, T and P. As a result, there are 729 (i.e., 9 x 9 x 9 = 729) possible combinations of R, T and P scores.
The mapping of RTP scores to moods is not evenly distributed. Rather, the mapping has been carefully curated over many years to identify the mood or classification that seems to best fit the majority of tracks having the same RTP score.
The names of the moods are arbitrary. Certain moods were picked to identify tracks having similar RTP scores that represent certain characteristics of those tracks.