Share

The mysteries of music—and the key of data

There’s much that’s mysterious about music.

“We don’t really have a good understanding of why people like music at all,” says David Temperley, professor of music theory at the University of Rochester’s Eastman School of Music. “It doesn’t serve any obvious evolutionary purpose, and we don’t understand why people like one song more than another or why some people like one song and other people don’t. I don’t think we’re anywhere near uncovering all of the mysteries of music but there are a lot of questions that people are starting to answer with data science.”

Researchers at the University of Rochester are at the cutting edge of this intersection of data science and music, developing databases to study music history and perfecting ways in which computers can automatically identify a genre or singer, model aspects of music cognition and extract the emotional content of a song, predict musical tastes, and offer tools to improve musical performance and notation.

As Temperley says, “There is a lot you can quantify about music.”

Mimicking Human Music Recognition

In 2014 the family of the late Marvin Gaye filed a suit against Robin Thicke, alleging Thicke’s 2013 pop song “Blurred Lines” infringed on Gaye’s 1977 “Got to Give It Up.” Analysts compared sheet music and studio arrangements to assess similarities and differences between the songs, ultimately awarding judgment in favor of Gaye’s children.

What if there was an accurate way for computers to identify these comparisons between vocal performance and performance styles?

Mark Bocko, professor and chair of the department of electrical and computer engineering, is working toward that goal. He brings his combined love of music and science to the study of subjects ranging from audio and acoustics, to musical sound representation and data analytics applied to music.

Bocko and his group have been using computers to analyze digitally recorded music files, with the goal of better understanding and mimicking the ways in which humans are able to recognize specific singers and musical performance styles. The project has applications not only in settling copyright disputes, but also in training musicians, studying trends in the development of musical styles, and improving music recommendation systems.

“When people listen to recorded music, they can recognize their favorite performers quickly,” Bocko says. “Human listeners can also listen to recordings and quickly make high-level judgments such as ‘Michael Bublé sounds a lot like Frank Sinatra.’ We’re studying what it is that people identify in the musical sound that might lead them to identify similarities in the performance styles of different musicians or to identify specific singers.”

Toward that end, Bocko and his team use audio processing algorithms developed in his and other research labs around the world. An MP3 file of a song is good for reproducing sound for listeners, but this format does not allow researchers to easily identify properties such as pitch modulation, loudness contours, or tempo variations. Using a variety of audio signal processing algorithms, computers can extract such information from sound recordings.

Further analysis of the data enables researchers to detect subtle structures. For instance, the computer can extract the pitch of every note in a song to show where, and in what ways, the singer took liberties. For example, if the frequency of a note in the written music is 220 Hertz, a singer might modulate the frequency in a technique called vibrato, which is intended to add warmth to a note. A singer might also drag slightly behind the tempo of the instruments, giving the song a more relaxed feel.

“If you add together all of those little details, that defines the style of a performer and that’s what makes it music,” Bocko says. “The detailed structure in the very subtle changes, such as in timing and loudness, can really change the feel of a piece.”

Using data analysis tools from genomic signal processing, similar to that which is used to study sequences in DNA, Bocko and his team search musical data for recurrent patterns—called motifs—in the subtle inflections of various performers and performance styles.

“It’s quite similar to DNA sequencing,” Bocko says. “You dig through all of this data looking for patterns that repeat throughout a performance.”

Bocko and his team coded motifs, and stored them in motif banks, for a number of performances. They then created computer programs to compare motif banks. In this way, they could demonstrate that Michael Bublé really does have a singing style similar to Frank Sinatra’s, but less similar to Nat King Cole’s.

This approach may ultimately enable computers to learn to recognize the subtle nuances between singers and musical performances that human beings are able to pick up simply by listening to the music.

And, it may offer quantifiable evidence of the similarities between “Blurred Lines” and “Got to Give It Up.”

Transcribing Music Automatically

Imagine you are a pianist and you hear something you would like to play—such as an improvised blues solo or a song on YouTube—for which there’s no score. Then imagine that instead of having to listen to the piece over and over again and transcribe it yourself, a computer would do it for you with an impressive degree of accuracy.

Zhiyao Duan, assistant professor of electrical and computer engineering, together with PhD student Andrea Cogliati, has been working with Temperley to extract data from songs and use that data to produce automatic music transcriptions—in effect, feeding audio into a computer and allowing the computer to generate the music score.

Most commercial programs are only able to convert MIDI (Musical Instrument Digital Interface) performances, recorded via a computer keyboard or other electronic device, into music notation. MIDI files do not represent musical sound, but are data files that provide information—such as the pitch of a note over time—that tells an electronic device how to generate a sound. Recent methods developed in the research community are able to convert audio performances into MIDI, yet the level of accuracy isn’t sufficient for the MIDI to be further converted into music notation.

Duan’s program records a performance and transcribes it all the way from instrumental audio to MIDI file to music notation with a great degree of accuracy. Upon comparing his methods to existing software programs, in a blind test in which music theory students evaluated the accuracy of the transcripts, “Our method significantly outperformed the other existing software in the pitch notation, the rhythm notation, and the placement of the notes,” Duan says.

Duan’s ultimate goal is to offer this software for commercial use, where it can help users to spot errors in a performance, search for pieces that have similar melodies or chord progressions, analyze an improvised solo, or notate it for repeated playing.

Duan and his team prerecord each note of a piano to act as a template for the computer—in essence, teaching the computer the various notes. Each prerecorded note is known as an atom. The computer code reconstructs a performance by identifying the notes the performer played and putting together the corresponding atoms in the correct sequence to create a musical notation transcript.

Duan uses signal processing and machine learning to help the computer identify the pitch and duration of each note and translate it into music notation. There’s one pitfall to his algorithm, however. The same piano note can be notated in more than one way; the black key between a G and an A on a keyboard, for example, can be called either a G sharp or an A flat. In order to generate an accurate transcription and determine the note’s proper notation, the computer must also be programmed to identify the proper rhythm, key, and time signature.

That’s where Temperley and his students at the Eastman School come in.

“We’re working on the idea of using musical knowledge to help with transcription,” Temperley says. “If you know something about music, then you know what patterns are likely to occur; and then you can do more accurate transcription.”

Source: University of Rochester

Comment this news or article