Introduction (Problem to be solved):

I want to discuss improving a basic shuffle mode. The problem is that with shuffle good songs take too long to replay (by definition, the expected replay time is equal to the total number of songs). And conversely, bad songs play too frequently. So the stated problem is, “I want my best music to play more often than my worst music”.

There are other problems. Such as the often stated, “I want the next song to match the current song”. Or, “I want the player to pick songs that match my mood”.

I would like a shuffle mode that could do all of these things. But I don’t think an automatic player is capable of doing all three. In fact, I think there is only just enough information to handle the first.

What information can be captured automatically, and how the speed of capture affects your choice:

First, let’s define automatic. Automatic means you don’t tell the player any extra information than a regular dumb player would be given by an oblivious user. This is somewhat fuzzy because it means you can record the current time, or maybe even the current listener (or listeners), but that’s assuming your player has those means. For instance you could maybe even detect the “mood” someone was in if you had a video camera. As you can see there are several automatic inputs that are rather extravagant. So let’s be more restrictive. Let’s just record at most the time and whether a song was played, or just skipped.

In fact, the time is has a lot of subtle information that you could infer. For instance you could say that there are correlations between morning preferences versus evening preferences. Or perhaps users go on long binges of just playing and binges of just skipping (and so, perhaps these binges could be treated in a particular way to help reduce the chances of any future skipping of the next song selected).

But I don’t want to even include time for one important reason. That’s the fact that breaking up your recorded information into more categories (i.e. morning skips, evening plays), means that you are also diluting the amount of information you have.

That is, say that 50% of the music played will be skipped (and for ease of calculations, skipped immediately). Also say that there are 3,000 songs at an average of 3:20 minutes each. That’s 5,000 minutes (or just shy of 3 ½ days) of play time, if you are using a regular shuffle. If we assume someone has two hours of commuting on average everyday, that’s six weeks of playtime before you hear any song twice.

So in other words, your collection of information is very slow. That’s a point I want to impress. When making a fully automatic player, you can’t rely on bunches of fancy inferences (or at least, it will be a long time before you can make use of them), if you have to rely on just automatic collection.

In case it isn’t obvious why still, let me clarify. After six weeks, your player will have accumulated about one point of information (either a skip or a play) for each of your songs. This isn’t a great deal of information to go on. Did he really like the songs he let play? Did he really hate the songs he let skip? If you were to break your information down further and analyze the temporal nature (i.e. morning songs, etc.), then you are getting that much less information.

Now it is true that the binges example could be taken advantage of, without dilution. But right now Squash does not do that calculation.

So to summarize my first major point: just recording skips and plays is the only reasonable approach picking the next song, if it has to be done automatically. It takes to long to wait for more refined information to accumulate.

There are also possibilities for advanced systems to share many users’ historical information together… but that’s a totally different thread.

Compensating for slow collection, the principle of reaction:

The problem with slow accumulation means that you need to help speed up the reaction of the system to the user’s actions as much as possible. I believe that means that you must act quickly and make strong assumptions on very little information. That is, even if you only have one skip, you mustn’t play that song again for a very long time. And even if you have only one play, you must try to play that song again as quickly as possible (though, of course, not too quickly, lest you annoy the user).

You will see these principles have shaped how I bias song selection.

The song selection routine used in Squash:

So that leads us at least on how Squash uses play and skip information in order to pick the next song.

I will outline the routine first, then go back and discuss any important points.

Squash uses a very simple approach, it first turns total plays, P and total skips S into an aggregate rating R, using this formula:

R = ( P – S ) / ( P + S + 1 )

This gives R a range of -1 to +1 exclusive, and songs with more plays than skips will be positive whereas songs with more skips than plays will be negative.

Next Squash calculates the average and standard deviation of all songs’ ratings (M and SD). I’m assuming you all know how to do that.

At this point Squash is ready to start picking songs. It does this with a more or less roulette-wheel algorithm. Or you can see it as just a normal-test. The routine is:

Randomly pick a candidate song. Perform a normal test on the song. If the candidate song passes the normal test, then this will be the next song played, if not, pick a new random candidate until one does.

This algorithm is guaranteed to end (statistically), so don’t worry that it won’t. And actually it has an average of two iterations before picking a song to play.

Now, for those that don’t know what a normal test is:

Take the rating, R and turn it into a z-score, Z.

Z = (R – M ) / SD

Then look up the Z score on a normal-curve table to obtain a probability, P. Next test the song with a probability of P (that is, if P is 0.5, then fifty-percent of the time that test will pass).

Some sample values of a normal curve table are (for those that are not familiar):

Z / P
-2 / 0.02
-1 / 0.16
0 / 0.50
+1 / 0.67
+2 / 0.98

As you can see it is like an S-curve.

Now I lied, Squash does one other thing to complicate things. That is there is a second test. If the song passes the normal-test, it is then subjected to a simple test that uses a counter called the repeat counter, RP:

If RP is zero then pass the test and set RP to 10
Else fail the test and set RP = RP – 1

This test helps decrease the maximum frequency that a song may have (that is, increase how long before a song can be played again). Also it means the expected number of iterations rises to twenty.

Discussing the Squash selection algorithm:

So the first thing I’d like to note is that I’ve applied my earlier principle of reaction by using a normal test. This is because the normal test uses an S-curve to determine probability. What this means is that a small movement of rating away from average rating is given a much higher movement of chance than movements of rating that are further away from the average rating. This means that just a single skip or play will affect a song greatly.

The second thing is that if you are astute you will notice that the above algorithm is actually a random selection routine and not a shuffle routine. That is, there is no guarantee that a song wouldn’t be played right after itself. That’s the fundamental difference between a shuffle and a random selection, in terms of the definition: A shuffle routine will guarantee that no song is played again until all songs have played, and a random routine will not.

Why random selection routines suck and shuffle routines suck less:

Random selection routines suffer from a starvation and an over-feed problem. Have you ever seen one of those slide-transition effects that take the old slide and replace single random pixel with the new slide, and then repeats the process with another random pixel until all pixels have been chosen?

If the next pixel is picked completely randomly from all pixels (in other words, it’s ok to pick a pixel more than once), it will take a long time before all pixels are selected. If the next pixel is picked from only unselected pixels, the amount of time to selected all pixels is much smaller (specifically, it’s equal to the total number of pixels). For instance let’s take a small example of just 4 pixels:

The first pixel is guaranteed to be unselected. The second pixel will have a 75% chance of being unselected, so that means it will take one and a third selections on average to pick the next pixel. The third pixel only has a 50% percent chance of being new, so that’s two selections needed on average. The final pixel has only a 25% chance of selection, so that’s four more selections. The total selections on average then, are eight and a third, which is twice what it would be if you only picked new pixels.

(This effect gets worse as the number of pixels, or songs, gets higher.)

So that’s an example of starvation, it takes forever for all the songs to be picked with a random selection routine. And since some of the songs are starved, that means others must have been overfed. A shuffle routine solves both of these problems.

So, then, why is Squash using a random selection routine? Because that’s the only way to do it!

It wouldn’t be a great idea to pick 3,000 songs based on your current rating information. That would mean you’d have to wait until song number 3,001 until you can take advantage of the rating information that you have been collecting for the prior 3,000 songs. So using a random selection routine has an advantage of using all the information available to pick the next song.

So to compensate for the starvation and over-feed problem, we add the repeat counter I mentioned about a bazillion paragraphs ago, remember? This strikes a middle ground allowing us to use the information collected immediately, and not completely starve or over-feed anyone. It’s a simple solution, but I’m sure someone smarter could come up with a better compromise.

Conclusion:

Squash’s algorithm was chosen partly from necessity (because of the slow accumulation of information) and partly because it is the simplest possible solution that is definitely better than both unbiased random selection or unbiased shuffle selection.

If you grant manual intervention (that is, sorting/rating the songs ahead of time), then there are definite improvements that could be made. If you have access to more environmental information, that would help too. Also, even with just automatic means there are some improvements that could be made (such as the binge detection, or aggregation of more than one user’s preferences).

However, this little rant is long enough. I hope that you at least will take with you the condition of slow information, and the principle of reaction. Thanks for reading.