July 26, 2006

Google Is More Open About Click Fraud

After a lot of pressure from advertisers and media, after a blogger from ZDNet has spread the idea that Google lets click fraud happen, after the release of many researches that show click fraud rate is increasing, after many lawsuits and settlements, Google started to become more transparent.

First they agreed to receive an independent report that examines their detection methods, policies and practices. This report shows that Google uses three ways to detect invalid clicks:
Anomaly-based (or Deviation-from-the-norm-based). According to this approach, one may not know what invalid clicks are. However, one can know what constitutes "normal" clicking activities, assuming that abnormal activities are relatively infrequent and do not distort the statistics of the normal activities. Then invalid clicks are those that significantly deviate (mainly in the statistical sense) from the established norms. For example, if a normal average clicking frequency on an ad is 4 clicks per week and if someone clicks on it 100 times per week, then this is an abnormally large clicking activity.

Rule-based. Each rule has one or several conditions in its antecedent and is of the form "IF Condition1 AND Condition2 AND … AND ConditionK hold THEN Click X is Invalid (or respectively Valid)." An example of such a rule is "IF Doubleclick occurred THEN the second click is Invalid."

Classifier-based
. A click is invalid if a data mining classifier labels it as "invalid." This labeling is done based on the past data about valid and invalid clicking activities used for "training" the classifier to decide which clicks are (in)valid.

Google has built the following four "lines of defense" against invalid clicks: pre-filtering, online filtering, automated offline detection and manual offline detection, in that order. Google deploys different detection methods in each of these stages: the rule-based and anomaly-based approaches in the pre-filtering and the filtering stages, the combination of all the three approaches in the automated offline detection stage, and the anomaly-based approach in the offline manual inspection stage.

The reports also shows that it's hard to define invalid clicks and to establish rules that precisely delimit them, but Google is constantly improving their detection system.

Another proof of transparency is a new feature of AdWords: advertisers can now see the number of invalid clicks found by Google. Advertisers are not charged for the invalid clicks and they can only see aggregate information about these clicks. Google doesn't disclose information about the IPs of the invalid clicks or the reason why they are invalid.

"The metrics of invalid clicks and invalid clicks rate will show virtually all the invalid clicks affecting an account. These clicks are filtered in real-time by our systems before advertisers are charged for them. The resulting data will of course differ from one advertiser to the next," says Shuman Ghosemajumder, Business Product Manager for Trust & Safety at Google.

No comments: