Skip to Content

What does the Benford law explain?

The Benford law, also known as the first-digit law or leading-digit phenomenon, is an observation about the frequency distribution of the leading digits in many real-life datasets. The law states that in many naturally occurring collections of numbers, the leading significant digits are not evenly distributed from 1 to 9, but instead follow a predictable logarithmic distribution with 1 as the most common leading digit, then 2, and so on down to 9 as the least common.

Requirements of Benford’s Law

For a dataset to conform to Benford’s Law, it must meet the following requirements:

  • The numbers must span multiple orders of magnitude (from 1 to 100, 1 to 1000, etc.) rather than be restricted to a narrow range.
  • The numbers must represent values that correspond to underlying relationships rather than being assigned arbitrarily.
  • The numbers must reflect some level of randomness rather than being deliberately manipulated to achieve an intended result.

Data that meets these criteria, such as populations of cities, stock prices, lengths of rivers, physical constants and mathematical tables, tend to follow the logarithmic distribution described by Benford’s Law. On the other hand, datasets like phone numbers, lottery numbers and zip codes that are more arbitrarily constructed do not follow the law.

Mathematical Formula of Benford’s Law

The mathematical formula describing the probability of a number having a leading digit of d is:

P(d) = log10(1 + 1/d)

Where d is any digit from 1 to 9. This formula gives the following probabilities for each leading digit:

Leading Digit Probability
1 30.1%
2 17.6%
3 12.5%
4 9.7%
5 7.9%
6 6.7%
7 5.8%
8 5.1%
9 4.6%

As can be seen, the probability of 1 being the leading digit is much higher than the others, and decreases monotonically as the value increases.

History and Origin

The Benford law is named after Frank Benford, a physicist who first proposed it in 1938 based on observations of large datasets of numbers. However, examples conforming to the law had been noticed by mathematicians over 50 years earlier.

In 1881, astronomer Simon Newcomb observed that the earlier pages of books of logarithms were more worn than the later pages. He hypothesized that the leading digits were more likely to be small numbers. In 1938, Frank Benford analyzed over 20,000 numbers from diverse datasets and found Newcomb’s hypothesis to be true, with the distribution of first digits matching the logarithmic scale that became known as Benford’s Law.

While Benford observed and quantified the phenomenon, he did not provide a mathematical proof. Proofs for the law were developed later using probability theory and statistics. Over time, the law was found to apply broadly across scientific data including electricity bills, stock prices, census statistics, and physical constants. It has since been observed in many contexts outside of science as well.

Real-World Examples

Here are some examples of real-world datasets that follow Benford’s Law:

Populations of Cities

The leading digits of populations of cities around the world follow Benford’s distribution:

Leading Digit Observed Frequency Expected Frequency
1 28.5% 30.1%
2 18.5% 17.6%
3 13.7% 12.5%
4 10.0% 9.7%
5 8.1% 7.9%
6 6.6% 6.7%
7 5.7% 5.8%
8 5.2% 5.1%
9 3.6% 4.6%

As can be seen, the observed frequencies closely match the expected pattern.

Lengths of Rivers

The lengths of rivers around the world in kilometers also conform to Benford’s Law:

Leading Digit Observed Frequency Expected Frequency
1 32.2% 30.1%
2 16.9% 17.6%
3 12.8% 12.5%
4 9.3% 9.7%
5 8.6% 7.9%
6 6.7% 6.7%
7 5.9% 5.8%
8 4.9% 5.1%
9 2.7% 4.6%

Atomic Weights

The atomic weights of elements also roughly follow Benford’s Law as their values span multiple orders of magnitude:

Leading Digit Observed Frequency Expected Frequency
1 25.8% 30.1%
2 19.7% 17.6%
3 16.5% 12.5%
4 10.5% 9.7%
5 6.1% 7.9%
6 7.3% 6.7%
7 5.2% 5.8%
8 6.7% 5.1%
9 2.1% 4.6%

While not an exact match, the general trend is clearly visible.

Applications and Uses

The Benford law has been found to have useful applications in many different fields including:

Fraud Detection

One of the most common uses of Benford’s Law is in detecting financial fraud. Accounting data that does not follow the expected distribution of Benford’s Law may indicate manipulated or fabricated numbers. Auditors frequently use compliance with Benford’s Law as a red flag to identify suspicious accounts or transactions that warrant further investigation.

Anomaly Detection

More generally, the law can be used to detect anomalies or deviations in any dataset. Any significant divergence from Benford’s distribution in a dataset that would be expected to conform indicates an issue with the data that needs to be examined and explained.

Forensic Analysis

In forensic accounting and investigations, Benford’s Law assists in identifying cases of embezzlement, skimming, overbilling and other financial crimes. Matching the distribution of first digits in financial records with the expected probabilities of Benford’s Law is an initial screening step to detect likely fraud.

Tax Audits

Tax agencies frequently use digital analysis based on Benford’s Law to select tax returns for auditing. Nonconforming distributions of first digits on returns can flag potential false or exaggerated deductions that warrant further scrutiny.

Elections

Benford’s Law has been proposed as a method to detect election fraud based on vote counts and voter turnout numbers. Suspicious deviations from Benford’s distribution in electoral data may indicate manipulated or fabricated numbers.

Image Analysis

In digital image forensics, Benford’s Law has been used to detect forged or edited images, as the distribution of pixel intensities in natural images follows Benford’s Law. Images with digitally altered areas diverge from this expected pattern.

Data Errors

Departures from Benford’s distribution can detect simple data errors like missing leading digits or digits being shifted. It can also identify more systematic data entry issues.

Data Modeling

In statistics and data modeling, Benford’s Law assists in selecting appropriate probability distributions and models to fit real-world data.

Numerical Algorithms

Knowledge of Benford’s Law informs the design of efficient computational algorithms and data structures optimized for real-world numerical data.

Limitations and Criticisms

While Benford’s Law has many useful applications, it also has certain limitations and criticisms:

  • It only applies to datasets that meet the specific conditions such as spanning multiple orders of magnitude. Many artificial or restricted datasets do not follow it.
  • Conformity to Benford’s Law does not definitively prove data is not fraudulent or biased. Manipulated data can also follow the law if the manipulator is aware of it.
  • Thresholds for determining acceptable divergence from Benford’s distribution are somewhat subjective.
  • It does not pinpoint specific items of data that are anomalous, only the overall pattern.
  • In some datasets, normal random fluctuations can cause deviations that trigger false positives.

Thus, Benford’s Law should be seen as a preliminary screening tool to identify potential areas of concern, not conclusive proof of anomalies. Appropriate statistical tests and investigations still need to be done to validate any findings.

Conclusions

In summary, the key points about Benford’s Law are:

  • It states that the leading digits of many real-world data sets follow a specific logarithmic distribution where smaller digits are more common.
  • The distribution applies to data that spans multiple orders of magnitude and represents underlying relationships.
  • Conformity can be used to detect potential anomalies including fraud, fabrication and simple errors.
  • Applications range from accounting and tax audits to forensic analysis and data modeling.
  • It has limitations and should be seen as one tool among many, not an infallible method.

Benford’s Law is a mathematical curiosity that found many useful applications in identifying patterns within complex numerical data. While it has limitations, it remains a simple yet powerful technique for flagging potential issues that merit closer investigation.

Frequently Asked Questions about Benford’s Law

Why do many datasets follow Benford’s Law?

The law applies well to datasets where numbers grow at exponential rates and have a wide range. In such cases, scale invariance leads to the logarithmic distribution of leading digits described by the law.

What types of data does not follow Benford’s Law?

Data restricted to a narrow range like phone numbers, deliberately assigned numbers like zip codes and numbers influenced by human psychology like prices do not follow the law.

Can the law be tricked or manipulated?

Yes, it is possible to manipulate data like financial statements to force conformity to Benford’s Law and conceal fraud. So conformity alone is not definitive proof of legitimacy.

What statistical tests are used with Benford Law analysis?

Common statistical tests used are Chi-Square Goodness of Fit, Mean Absolute Deviation and Distortion Factor to quantify deviations from the expected distribution.

How are Benford’s Law probabilities distributed across digits?

The probabilities decrease logarithmically from 30.1% for 1 down to 4.6% for 9. The exact probabilities are given by P(d) = log10(1 + 1/d) for digits d from 1 to 9.

What tools are used to analyze data for conformity with Benford’s Law?

Many statistical packages and spreadsheet software have built-in functions to test if data conforms. There are also specialized Benford analysis tools used by auditors and fraud examiners.

References

If you require further information on Benford’s Law, below are some useful references:

  • Newcomb, S. (1881) Note on the Frequency of Use of the Different Digits in Natural Numbers. American Journal of Mathematics, 4(1), 39-40.
  • Benford, F. (1938) The Law of Anomalous Numbers. Proceedings of the American Philosophical Society, 78(4), 551�572.
  • Hill, T.P. (1998) The First Digit Phenomenon: A century-old observation about an unexpected pattern in many numerical tables applies to the stock market, census statistics and accounting data. American Scientist, 86(4), 358-363.
  • Nigrini, M.J. (1999) I’ve Got Your Number. Journal of Accountancy, 187(5), 79-83.
  • Fewster, R.M. (2009) A simple explanation of Benford’s Law. The American Statistician, 63(1), 26-32.