[UPDATE 2] Inspiration strikes a chord!

This is an update to my previous post, “Inspiration strikes a chord!”, detailing my pet project to decipher the songs in some decorative piano rolls I found in a restaurant. Click the following links for post 1 and post 2!

*SPOILER ALERT* By the end of this blog post, we’ll be making music!

Previously on “Inspiration strikes a chord”, we cropped and rectified imagery of a piano roll, applied a few image filters, and arrived at a binary image of “notes” and “not notes”. The cleaned up result looked something like this…

From this point, extracting a tune from the image is as simple as performing connected component analysis and interpreting x-coordinate, y-coordinate, and size information as time, pitch, and duration, respectively. Filtering components by height removes some of the errors created by dark creases in the original image. After filtering, we see that the final components selected are only those which correspond to notes.

A typical piano roll can encode the full 88-note range of a piano, but we don’t know the equations of the low-A or high-C lines. That sure doesn’t stop us from making some sloppy guesses! By naively mapping the lowest note to low-A and the highest note to high-C, we get something that sounds, err, like a jazzy mess:

It’s got rhythm… and not much else. We’ve achieved angry-one-year-old quality music.

So, what is the correct pitch scaling? I converted connected components into pitches by binning y-coordinates. In a perfect world, each y-coordinate would fall in the exact center of its corresponding bin, so any deviation would be considered error. By summing the distance in pixels from bucket-center across all notes, I can now evaluate the quality of my binning function and choose a pitch-to-pitch distance minimizing the error. I tried a range from 2.5px to 5px and a plot revealed an obvious minimum

Using our new pitch-scale, things sound reasonable

Not bad! The notes definitely make up chords, but the chord progression is wrong…

Something is still not right. It almost sounds like a song played backwards… EUREKA! I had assumed the piano roll was read left-to-right, but upon closer inspection, the score was BACKWARDS! I flipped the image mask over the y-axis and re-processed to obtain the final piece:

PERFECTION ACHIEVED (ignore spontaneous modulations)

Music at last!

I have a ton of imagery to process; next post, I’ll share a proper gallery!

P.S. I have published my source code on GitHub! Enjoy!

Leave a Comment

Your email address will not be published.