Tuesday, September 15, 2020

Why Random Slideshows Don't Work

 My experience with random slideshows, using a variety of programs (including my own system based on a very good random generator) has been horrid.  I know I shouldn't believe in such things as the world being a simulation and under the control of all-knowing aliens who torture me by endlessly repeating the slides I like the least over and over, while the majority of my vast collection...and especially all the slides I really like...remain unseen, but I've gotten close to that.

I see now there are simpler explanations.

The core of it all may be the well known Birthday Paradox.  Here's a good explanation of the paradox. I still find the math to be troubling (which I shouldn't, having had a long career associated with probabilities and statistics).  But very convenient that it boils down to a very simple approximate formula, which somehow, I had already remembered (but couldn't remember why I remembered).  It's the square root of N.

In actual birthday paradox, if there are 365 days per year, the number of people you'd need to have a 50% chance of two having the same birthdays would be approximated by the square root of 365, which happens to be just above 19, but since 19 is a relatively small number, the approximation is a bit too small, and the actual number when you do the full math is 23.

I can scale this up to my Random Slideshow Paradox: with a collection of 100,000 slides chosen at random, and each being shown for 15 seconds, a duplicate slide will be shown within just over the first 333 slides, and therefore within the first 90 minutes of watching!  So you may see a duplicate in nearly every viewing session, when you'd think that might take weeks.

It does however require the non-mathematically proveable (but widely accepted) Murphy's Law to understand why this duplicate will inevitably be the least desired slide.  Murphy is somewhat helped by the fact that in this scenario, the slide shown most may become disliked.

Now, you might think it would be a relatively simple matter to apply something like a random without replacement strategy to get past this.  But it's hard to get this to work as desired without all knowing aliens.  Because without sensors, etc, it's hard for any program to know which slides you have actually seen before (because you might have been doing something else) or even presented before (because you might have been running in a different program session--and most programs don't save information like what slides have been shown previously in a permanent database) or even a different program.

Now suppose you leave the random slideshow--with replacement--running, but simply watch for 60 minutes at the same time every day.  This has effectively defeated the without replacement strategy because within only a few days it will have to start all over again.  In my 20 years of experience trying such things (but without much elaboration) I had speculated that in this scenario it was actually worse to use a without replacement strategy than not.  Thinking about it now, I doubt this is true, but unless the without replacement is fully implemented, such as by determining which slides you have actually watched by any means, it won't work very well either, as it won't prevent you from being shown the same slides within a week or so, when you've still only seen a small fraction.

And no matter how well you were being watched by the program, it would not help with situations like having a bunch of very similar slides, or even identical copies (as I used to have before I wrote a program this year to weed them out).

Now the parameters to a well designed solution to this problem are slowly coming into focus.

But the first thing to point out, is that until some program has the necessary features built in, it will be necessary to fake it.  It will be necessary to feed the slideshow program a specific playlist, which could be assembled by some partly random algorithm, to get it to do the correct things.  So every time the slideshow presenting program is started up, the playlist generating program should create the playlist, which then must be played in non-random order, and preferably without looping.

The slideshow generating program should attempt to balance things many ways.  First, by avoiding timely duplication.  Second, by making sure every slide gets shown about as reliably as reasonable (the inverse of the Birthday Paradox, even with more than 365 guests there will be many still birthdays not present).  Third, it should account for the desirabilities and similarities in the files by assigning weights to individual pictures, possibly on a per-folder basis.  The goal is to undo the evil aliens and arrange things so that I experience few timely repetitions, nearly every slide gets shown (this may actually be more important), but generally more of the slides I like most.

The same considerations apply to playlists of any kind, not just still pictures but movies and songs.

I don't like being watched, so I'll leave that kind of thing out of my proposed playlist generating program.  But it may attempt to approximate that by various means, including keeping a database about previously generated playlists.  Various functions could be applied to approximate the "slides actually seen" function.  The first would be to apply "windows" to previous playlists.  The most obvious window is at the beginning of the playlist, the first hour or so perhaps.  A day later, the pictures within the first hour of presentation time may be semi-presumed to have been seen and should be put later in the next playlist.  A fancier version might account for viewing times at certain hours of the day, but at minimum the Early Window should be adjustable, and I think 1-2 hours is good.  These could also be applied on a daily basis at the same times for a Daily Window.

Another heuristic would be to move up things which probably haven't been seen.  If a playlist of 14 days duration were replaced within 2 days, we could assume the last 12 days were not shown.  Perhaps the user should specify how many days/hours the previous playlist was actually used, defaulting to the number of hours since it was created or how long it was.  In other cases it might be better to apply a function of deceasing probability of having been seen across the entire previous playlist, though I'm not currently seeing how to do it other than badly.

The general principle behind these heuristics is to approximate a probability function of each slide having been seen, and thereby better approximate a without replacement strategy across multiple slideshow sessions using subsequently created playlist.  But it should also be remembered it is only an approximation, more likely fewer were seen than presumed by any reasonable algorithm.

The assignments of desireability weighting is probably also best done on a folder-by-folder basis (as it's too time consuming to weight every single slide).  Less desirable slides can be placed in subfolders as needed.  Weights need to be made widely variable down to infinitessimal numbers because of the exponential nature of the probabilities involved, but probably expressed as multiplicative inverse so that tiny weights because large numbers are easier to express than small fractions.  Such as "show the slides in this folder 100 times less than average."  

The playlist generating program itself has remembered-parameters detailing things like the above, which themselves could be stored in reloadable meta playlists, but then also further customized before each generation.  And the databases of previously used playlists (which should be editable to remove unused ones) may or may not be associated with a particular meta playlist.

Finally, since I've already created the Collection Assistant program I might have previously described here this year (after years of never having time to sit down and write it, I wrote the program in a week or so in 2020) I can make the slideshow generating program work off of lists created by that program which include checksums for unambiguous recording of file identity (otherwise you are stuck with filenames and folders that rigidly can't be changed because previous databases rely on them staying the same, and I am changing my folders around all the time to better reflect my interests).