As a vendor its useful to have information flowing back from the user base about encounters with exploits, malware, etc. As a customer, however I may not want anyone outside of my security department to know that we have been infected by the Conflicker Worm (at least until we have cleaned it up ...or covered it up).
How do I as a vendor, automatically collect usage data directly from a customer, while giving them anonymity? They are not likely to trust us if we simply say we don't look at the source of the information (after all we may have web-server or firewall logs that could correlate their source IP back to the data collected). Since the information is sent to us, customer by customer over the network how can we assure them that we divulge sensitive information?
To solve this problem, we can take a cue from the most unlikely source... the study of human sexual behaviors. In the late 80's when HIV/AIDS was becoming an epidemic, it was important for researchers to have a way of asking people questions they wouldn't want to admit to. These may be behaviors that were illegal or immoral and hard for a subject of a study to answer truthfully to an all to human research assistant sitting across from them. The answers however were important as researchers were trying to determine how the virus was spreading. A mathematician named Joel E. Cohen came up with an answer(1). His idea was to have each person flip a coin (or do some other random 50/50 generation). If they had done the illegal or immoral behavior they would ignore the coin and answer yes. If they had not, they would answer yes or no depending on the flip of the coin.
This gave the people answering yes (where they had actually done the illegal or immoral thing) anonymity, because many of the others would be answering yes too. Each subject had plausible deniability when answering yes. To get the actual average they would simply double the number who had answered 'No' (Which remember was about half of the people for which it was ACTUALLY no). In a large enough set this gave a accurate percentage. So if 20% had done the illegal/immoral thing and the half of the other were forced into answering yes, you would have around 60% saying yes. At the end of the study the No's would be doubled to come up with the real 80%/20% ratio.
So you can likely see where I'm headed with this... If I'm automating a usage statistic system that reports to me daily from customer environments that reports when a certain exploit or malware has been encountered I could simply have the program respond yes half of the time when it's no (very counter intuitive for a programmer).
As a vendor, if I did have malicious intent and decided to start snooping these reports I couldn't tell from a single customer report if the yes was real or not. I can't actually tell if any given customer has been infected by Conflicker. Once I put together hundreds or thousands of reports, I can double the No's and get the real ratio accurate to a few percentage points.
Now I only have to convince the customers of this :)
(1) It was later understood that others had used forms of random responses in previous studies including Fiddler and Kleinknecht (77), Dawes and More (80), and Fiering and Hooper (85)