Feeds:
Posts
Comments

Posts Tagged ‘Alladi Memorial Lecture’

Please read the first post of this series before this one

 Learning From Numbers To Generate New Kowledge- Part 1

2. Role of statistics in different activities                      

Statistics is not a subject like physics, chemistry or biology. A physicist solves a problem in physics using his knowledge of physics. A chemist solves a problem in chemistry using his knowledge of chemistry, and so on. But there is no problem in statistics which we solve by using our knowledge of statistics. Essentially a statistician helps in solving problems posed by others arising in their fields of study. All investigations in science or other activities start with formulating a problem, generating relevant data, processing it, and extracting information to throw light on the problem posed. All these need special skills which a statistician is trained to do.

2.1 Scientific research

 

            “Scientific laws are not advanced by the principle of authority or justified by

             faith or medieval philosophy; statistics is the only court of appeal to new

             knowledge.”

                                                                                    –  P.C.Mahalanobis

A scientist proposes a theory to explain some natural phenomenon. An experiment is needed to verify the theory. How to design an experiment to get the maximum information from the data generated to estimate the accuracy of the theory. If the accuracy is not within acceptable limits, can the data generated from the experiment enable us to suggest improvements in the proposed theory or to propose a new theory. The new theory can be tested by further experimentation. These problems can be answered with statistical help using design of experiments developed by R.A.Fisher. Emphazing the need for consulting a statistician before the experiment is conducted, Fisher said:

     “You get 10 times more information from a carefully designed experiment. To consult a

         statistician after the experiment is finished is often to merely ask him to conduct a

         postmortem examination. He can only say what the experiment died of”.

 

Through collection of relevant data by optimally designed experiments and appropriate data analysis to test hypotheses based on the proposed theory and to provide clues for improvement of the theory or for possible alternatives, statistics enables the scientist to have a full play for his creative imagination to discover new phenomena or suggest improvements in the proposed theory.  Science advances through the following endless process:

   –Theory-Experiment –Statistical assessment of experimental results- New theory-

                          

2.2 Statistics as an investigative technology

 

“Statistics is the technology of finding the invisible and measuring the immeasurable”.

2.2.1 Measure the immeasurable

For instance narcissism, a personality disorder, is hard to measure. However, we can measure a large number of other characteristics of a person which are affected by this disorder. Statistical methodology enables us to connect the measure of narcissism, as a latent variable, to the measurable characteristics through a structural equation model, and estimate it.

2.2.2 Classification or discrimination

There was a policy in US military that while recruiting a person to the army, “ask not and do not answer” about the homosexuality of the person. However, a sample of urine of the person can be obtained and tested for the amounts of androgen and estrogen. It is seen from the two dimensional chart of the measurements obtained from  sets of individuals whose sexual orientation was known, that the homo and heterosexual persons are in 2 different regions, separated by a line, apart from a few exceptions. By plotting the point for any particular individual, his sexual orientation can be inferred with a high degree of success based on the region in which his measurements fall.

This method known as discriminant analysis in statistics, developed by R.A.Fisher and perfected by various authors, has been a powerful tool in such problems. For instance, the method can be used in problems such as medical diagnosis to determine which out of several possible diseases a patient is suffering from based on a number of diagnostic tests, in detecting whether currency is faked and numerous other situations.

2.3 Birth order and eminence

Scholarly interest in the relationship between birth order and extraordinary achievement can be traced to 1874 when Francis Galton published English Men of Science: Their Nature and Nurture. This book chronicled the lives of 180 eminent men from various fields. Galton was able to collect birth order data from 99 of his subjects, revealing that 48% of them were first born sons or only sons. The percentages of the second and third born were very low. Interest in birth order and  eminence has continued, and countless studies have confirmed Galton’s conclusions that eminence achieved or intelligence of a person depends on his birth order, the fist one being more intelligent than the second, the second more intelligent than the third and so on. The table gives results of intelligent tests conducted on children from families of different sizes, indicating the birth order effect on intelligence.

It would be of interest to investigate the causes of birth order effect. It is believed that the first born gets more parental attention than the later born and has a chance of growing up in the company of adults and learn from them. The second born has similarly more opportunities than the third and so on.

2.4 Common breeding ground of eels.

This is an example to show how learning from numbers led to an important discovery. In the early years of the last century, Johannes Schmidt, a scientist at the Carlsberg Laboratory found that the numbers of vertebrae and fin rays of the same species of fish caught from different localities, often even from different parts of the same lake, varied considerably. With eels, however, in which the variation in vertebrate number is large, Schmidt found sensibly the same mean, and the same standard deviation in samples drawn from all over Europe, from Iceland, from the Azores and from the Nile river, which are widely separated regions, about 1000 miles apart. He inferred that the eels of all these different river systems came from a common breeding-ground in the ocean, which was discovered 50 years later in one of the expeditions of the research vessel “Dana”. Statistical theory was unknown when Schmidt made this discovery. Simple computations of the mean and standard deviation were the only tools used.

 2.5 Mournful numbers

We are continuously made aware of, through news papers, magazines and other news media, the good and deleterious effects of our dietary, exercise, smoking and drinking habits, and the stress in our profession and other daily activities. The following table gives the information on the number of days lost or gained in one’s life due to various causes. The numbers may not be appropriate for specific individuals. However, they provide useful guidelines in making individual decisions.

2.6 The importance of being left handed

T.A.Davis, a professor at the Indian Statistical Institute made several studies on coconut trees which can be classified as left-handed or right- handed depending on the direction of its foliar spiral. By doing experiments he found that spirality is not genetically inherited and left handed trees yield 10% more coconuts than the right handed trees, a conclusion of economic importance. A recommendation was made to the Government in the state of Kerala to grow only the ” leftists to increase the production of nuts”.

2.7 Chronobiology and appropriate time to take Vitamin C

 

Chronobiology is the study of changes in body chemistry during the day. Measurements made on the human body at different times of the day reveal some interesting facts. We are 1 cm taller in the morning than at the time we go to bed. The cortisol level is about 16mg/100 in the morning and it drops to 6mg/100 at bed time. The high cartisol level in the morning wakes you up and you are more alert. Teachers want to teach in the morning because students are more attentive in the morning due to high cartisol level. It was found that vitamin C is better absorbed if taken after a meal.

The examples given above show how numbers generated through experiments or generated through normal transactions provide us with knowledge or information to take optimal decisions in all our activities.

2.8 Facts before theory

 

       “It is a capital mistake to theorize before one has data. Insensibly, one begins to

       twist facts to suit theories instead of theories to suit facts.

 

–          Sherlock Holms

    Without good information, you won’t see things as they really are-you will see them         

    as you think they are.

           “Aristotle maintained that women have fewer teeth than men; although he was

            married twice, it never occurred to him to verify his statement by examining his

            wives ’mouth”.

–           Bertrand Russel

 

2.9 Computational stylistics

 

The total number of words in all the known works of Shakespeare is 884647 of which 31534 are distinct. Using a statistical method proposed by R.A.Fisher, it is estimated that Shakespeare probably knew about 35000 more words which he did not use in his writings. The total number of words Shakespeare knew is about 66000 out of about 100000 words in the English language in his time. The question arises whether Shakespeare wrote all the plays attributed to him or he had co-authors. Statistical methods, known as computational stylistics, provides answers to questions of this kind. Comparing the styles in terms of rhetorical devices, polysyllabic words and metrical habits, the following possibilities have been mentioned in the   book ”Shakespeare, Co-Author”, by Brian Vickers.

Ceorge Peele wrote a third of Titus Andronicus, Thomas Middleton, two-fifths of Timon of Athens, George Wilkins, two of the five acts of Pericles and John Fletcher, more than half of Henry VIII and The two Noble Kinsmen. 

2.7 Chronobiology and appropriate time to take Vitamin C

 

Chronobiology is the study of changes in body chemistry during the day. Measurements made on the human body at different times of the day reveal some interesting facts. We are 1 cm taller in the morning than at the time we go to bed. The cortisol level is about 16mg/100 in the morning and it drops to 6mg/100 at bed time. The high cartisol level in the morning wakes you up and you are more alert. Teachers want to teach in the morning because students are more attentive in the morning due to high cartisol level. It was found that vitamin C is better absorbed if taken after a meal.

 

The examples given above show how numbers generated through experiments or generated through normal transactions provide us with knowledge or information to take optimal decisions in all our activities.

 

2.8 Facts before theory

 

       “It is a capital mistake to theorize before one has data. Insensibly, one begins to

       twist facts to suit theories instead of theories to suit facts.

 

          Sherlock Holms

    Without good information, you won’t see things as they really are-you will see them         

    as you think they are.

 

           “Aristotle maintained that women have fewer teeth than men; although he was

            married twice, it never occurred to him to verify his statement by examining his

            wives ’mouth”.

           Bertrand Russel

 

 

2.9 Computational stylistics

 

The total number of words in all the known works of Shakespeare is 884647 of which 31534 are distinct. Using a statistical method proposed by R.A.Fisher, it is estimated that Shakespeare probably knew about 35000 more words which he did not use in his writings. The total number of words Shakespeare knew is about 66000 out of about 100000 words in the English language in his time. The question arises whether Shakespeare wrote all the plays attributed to him or he had co-authors. Statistical methods, known as computational stylistics, provides answers to questions of this kind. Comparing the styles in terms of rhetorical devices, polysyllabic words and metrical habits, the following possibilities have been mentioned in the   book ”Shakespeare, Co-Author”, by Brian Vickers.

 

Ceorge Peele wrote a third of Titus Andronicus, Thomas Middleton, two-fifths of Timon of Athens, George Wilkins, two of the five acts of Pericles and John Fletcher, more than half of Henry VIII and The two Noble Kinsmen.

End of Part 2

Will be posting the concluding part of the lecture in my nest post – Archana

Read Full Post »

I had attended a lecture by Dr.C.R.Rao, a world renowned statistician. I had live tweeted the lecture (@ArchanaRaghuram). Many people had requested for the entire  transcript. I am posting the transcript in three parts.

The difference between the Philosophers, Scientists and Statisticians view of knowledge

Statistics is the science, technology and art of developing human knowledge through the use of empirical data.

 1 Concepts of Knowledge

 Knowledge is what we know, also what we know we do not know. We discover what we do not know essentially by what we know. Thus knowledge expands. With more knowledge we come to know more of what we do not know. Thus knowledge expands endlessly. What exactly is the process involved in generating new knowledge? What confidence do we have in the newly created knowledge and how do we use it. In order to understand these problems let us look at different views of knowledge.

 1.1.         Philosopher’s view of knowledge

 Philosophers maintain that knowledge is infallible. The different instruments for acquiring certain knowledge are:

  • Deductive logic or pure reasoning from given premises as advocated by Kant.

The process is the same as that in mathematics, where we lay down certain axioms taken to be true and derive propositions by arguing from them. However, we have to make sure that conclusions drawn from different sets of axioms are not contradictory. The logician Godel proved that consistency of a given set of axioms cannot be established by using the same axioms. He also showed if one contradiction occurs, any contradiction can be established.

  • Mill’s inductive logic of reasoning from particular to particular. For example if it is known that in the past, banks refused to give loans if the applicant had filed for insolvency at any time, we conclude that the same will hold in the future. Byinduction, we generally mean arguing from the particular to the general.
  • The Indian Philosopher Vivekananda and Einstein maintained that new knowledge can be created only by instinct, reason and inspiration, a process known as abduction and not by deductive reasoning assuming a given set of premises to be true or by inductive inference from observed data. ( “a theory can be proved by an experiment, but no path leads  from experiment to theory”-Einstein).
  • The ancient Hindu scriptures mention, perception (pratyksha), inference(anumana), comparison (upamana) and verbal testimony (sabda) as possible instruments for creation of new knowledge.

 1.2 Scientist’s view of knowledge

 Scientists maintain that all knowledge is fallible, i.e., there is nothing like a true knowledge. They create scientific knowledge by the following steps.

 (1)   Build a model for observed data using the information contained in the data or through instinct, reason and inspiration.

(2)   Then generate new data through an experiment or taking observations in nature and see how well the suggested model can predict the observed data.

(3)   If the accuracy of prediction is within acceptable limits for practical applications, the model is given the status of a scientific theory. If not the model is rejected. In either case, research will continue to find a theory which gives   predictions with a higher degree of accuracy. Each time, we replace the existing theory by the new one.

(4)   Sometimes more than one theory can co-exist as Newton’s laws of gravitation and Einstein’s theory of relativity although the latter is more comprehensive than the former. For practical purposes, even sending a man to the moon, Newton’s laws of motion can provide results of sufficient accuracy. Neither of them is strictly true as the following famous scientists affirm.

 1.2.1 Views of some scientists on scientific theories:

  “An experiment does not even establish the relative truth or falsity of a hypothesis but merely furnishes a basis for deciding acceptability”.

                                                -A.H.Copeland (Philosophy of Science, 33,303-316, 1966)

“If you thought that science was certain well that is just an error on your part”.

                                                      -Richard Feynman (Nobel Laureate)

       “In science, fact can only mean confirmed to such a degree that it would be  perverse to withhold provisional assent”

                                                       -Stephen Jay Gould

      “There has not been a single data in the history of the law of gravitation when a      modern test of significance would not have rejected all laws and left us with no   laws”                                       -H.Jeffreys (in The Theory of probability)

       “There is no need for these hypotheses to be true or even to be at all like the truth; rather one thing is sufficient for them-that they should yield calculations which agree with the observations”

                       -Andreas Osiander (1498-1552) in preface to Copernicus De Revolutionibus

 1.2.2 The sad story of Galileo (15 Feb 1564-8 Jan 1642) and the Catholic Church

During the life time of Galileo a large majority of philosophers and astronomers subscribed to the geocentric view that the earth is at the centre of the universe.  When Galileo began publicly supporting the heliocentric view, which placed the sun at the centre of the universe, he met with bitter opposition from some philosophers and clerics, and two of the latter eventually denounced him to the Roman Inquisition early in 1615.

The position of the church as explained by Cardinal Bellarmino in 1615 was similar to what Osiander thought a century earlier that the church would raise no objection if Galileo stated his theory as a mathematical hypothesis, “invented and assumed in order to abbreviate and ease the calculations”, provided he did not claim it to be a true description of the world. In 1916 Galileo agreed not to advocate his views and  he was cleared of any offence. When he later defended his views in his most famous work, Dialogue Concerning the Two Chief World Systems, published in 1632, he was tried by the Inquisition, found “vehemently suspect of heresy”, forced to recant, and spend the rest of his life under house arrest.

   1.3 Statistical view of knowledge

   All knowledge derived from observed data is uncertain with the degree of uncertainty depending on the amount and quality of available data. Unlike in science, in real life action has to be taken on available knowledge however meager or uncertain it is. We are always seeking answers to questions like: What career should I choose? How do I invest my money? Should I go abroad for higher studies or continue in the country? Should I take drug A or B for my headache? There are no definite answers to these questions in view of uncertainties in available information, but decisions cannot be postponed.

To the human mind tuned to deductive logic over several centuries, formulating rules for decision making under uncertainty which can go wrong posed a challenging problem. It is only in the beginning of the last century, it was realized that knowledge, however meager, is usable if we know the amount of uncertainty in it, in the sense that we can formulate optimum decision rules, i.e., with minimum loss, which is the subject matter of statistics developed as a separate discipline in the last century. The fundamental equation of statistics may be stated as follows:

      Uncertain                     Knowledge of                              Usuable

                                    +      of amount of              =       

     Knowledge                  uncertainty in it                            knowledge

   1.3.1 History of statistics

Statistics has a long antiquity but a short history. Its origin can traced back to the primitive man who put notches on trees to keep an account of his possessions. As early as 5000 BC, kings used to carry out census of populations and resources of the state for selfish reasons. When democratic governments were formed, it was the task of the governments to collect information about the people and on the resources of the state to make short term policy decisions and formulate long range plans for improving the living conditions of the people. The information collected by the government was called official statistics (data collected of the people for the people by the government). The word statistics was coined by the German Scholar Achenwal in the middle of the 18th century to mean data, analysis and use by the government.  The first State Statistical Bureau was established in 1800 in France.

 It is interesting to note that Shakespeare came close to invent the word statistics or statistician. He used the word ‘statist’ in his drama Cymbeline in 1600 and ‘statists’ in plural in Hamlet in 1609 to denote, perhaps, officials connected with the state.

  The first nongovernmental use of statistics is in computing life insurance rates based on the data of births and deaths, called Bills of Mortality, in the 17th century. During this period analytical studies were made on death rates from different diseases and the growth of populations in different regions of a state. In 1900, Karl Pearson used concepts of probability to test scientific hypotheses based on observed data in any field of enquiry, which is the beginning of the modern theory of statistics. The theory of statistics was developed during the period 1900-1940 by R.A.Fisher, J.Neyman and A.Wald Statistics was introduced as a separate subject of study and research in universities in the decade, 1940-1950.

The second half of the last century saw the development of statistics as the science and technology of using information as the main tool in all areas of human endeavour from scientific research, designing and controlling the quality of goods, medical diagnosis, national security, giving evidence in courts of law in cases such as disputed paternity and authorship, detection of fraud and to making personal decisions. As R.A.Fisher said in a speech delivered at the Indian Statistical Institute in 1952:

       “Statistical science is the peculiar aspect of human progress which gave 20th      century its special character.   It is to the statistician the present age turns for       what is most essential in all its more important activities”.

 I shall give some examples to show how statistics works in different activities.

Read Full Post »