Step 1, employ some clever tricks to minimize variants in the system, ie employ invariants. Eg translational invariants, color invariants, etc.
Step 2, map your inputs onto the surface of a hypersphere.
Step 3, cluster the points on the hypersphere by matching each point to the nearest cluster.
Step 4, recreate the inputs by drawing the cluster centers in place of the inputs. As much as I want to, we can’t just maximize the magnitude of the cluster means. Finding patterns isn’t merely a matter of describing your inputs - they’re a matter of describing inputs that aren’t accounted for by other patterns identified elsewhere. So we need to reconstruct our inputs to determine which patterns are responsible for which content, so we know whether a pattern is contributing something new or whether it’s redundant. Without being aware of other patterns, ie maximizing magnitudes alone, we allow for redundant, overlapping patterns.
Step 5, backprop. If data is small/sparse maybe use some other method. This step should be super easy if we set things up right. It could really just be a matter of making a new feature with the remaining signal after we reconstructed with the patterns we have, but some fine tuning can’t hurt. This step should yield basically the same results one way or another so backprop is the safest way to test the algo since we know it works.