The fingerprint hidden in our words

Researchers have concluded that there exist certain patterns in the way we write. Presenting here a beautiful example: Back in 1788, there were a group of three writers who individually authored 85 papers on US constitutional reforms. Suppose the authors were A, B and C. Now, the papers did not have the names of the authors. It was a popular belief that 51 of the papers were authored by A, 14 by B and 5 by C. This still left 15 papers unaccounted for.

Now, it was to be found who wrote the disputed papers. Various linguistics were asked for help. But since the papers were on varied topics and the writing styles were not very disparate, there was no definitive outcome. Now, arithmetic was tried to solve the riddle. The average length of the sentences was tried out, but the average varied from paper to paper. There was no definitive trend which could separate the authors. The use of various words like ‘while’ and ‘whilst’ was tried, that could not help as well.



Next, a set of 30 words were identified like ‘the’, ‘by’, ‘upon’, ‘and’, ‘there’, ‘enough’, ‘according’ etc. And the usage of all these words (density per 1000 words) for all the papers by the three authors was measured. The results were simply astounding. Author A and B used ‘the’ at a rate of 91 and 94 per thousand words. Whereas C used ‘the’ at the rate of 64/1000; this clearly demarcates C from A and B.
B used ‘upon’ at a constant rate of 3/1000 whereas C never used the same. Similarly, A used ‘by’ at a constant rate of 13/1000 and the rate was constant for B at 5/1000. Thus all these trends were identified (the numerical fingerprints were decoded) and the authors of the disputed authors were identified with a fair level of accuracy.


Thus, the post shows that there are patterns, hidden fingerprints even in the way we write. So start comparing you text with some of your friend’s and see if you can observe something interesting. 

Love,
Ankit

Comments