# Statistics

**What does statistics mean?****Statistics**; A collection of data for a specific purpose. Summarizing with statistics tables and graphs, interpreting the results, explaining the reliability ratings of the results, generalizing the results for the masses for the mass, relational research among the properties, forecasting the future in various topics, experiment arrangement and observation principles. It is based on the basis for the collection, classification, analysis and interpretation of data for a specific purpose.

- Physics can be applied to a wide range of fields such as natural sciences and social sciences. It is also used to make decisions in the business world and in all government related areas. The statistics are unambiguous in the sense above. The plural meaning of the word is “
*numerical information gathered systematically*“. For example, population statistics, environmental statistics, sports statistics, national education statistics can be given.

- The purpose of learning statistics is to know how to interpret the data obtained in a research using appropriate statistical methods.

- Statistical methods are used to summarize or disclose aggregated data. Such an approach takes the name descriptive statistics. In addition, the modeling of conclusions about the studied population or process, such as overlaps (patterns or patterns) in observations, randomness and uncertainty in observations, is called inferential statistics. Both descriptive statistics and predictive statistics can be counted as parts of applied statistics. The discipline called mathematical statistics is the discipline that examines the theoretical mathematical background of the subject.

**The concepts arising from the relations of statistics with other departments can be shown as follows:**- Economics + Statistics = Econometrics,
- Psychology + Statistics = Psychometry,
- Medicine + Statistics = Biostatistics,
- Sociology + Statistics = Sociometry,
- History + Statistics = Climatology.

- The statistical word is derived from the statisticum collegium in modern Latin and the statista (
*statesman*,*politician*) in Italian. The word was first used in the sense of the state in the Statistic (1749), in which state-owned data was presented by Gottfried Achenwall in German. The English term containing this definition was then referred to as political arithmetic. The statistical meaning of data collection and classification was earliest in the early 19th century. The term was translated into English by Sir John Sinclair. The main aim of the work titled Statistic is to present the data to be used by the government and administrative bodies. The collection of information on the states and local regions is being carried out by national and international statistical institutions. Regular information on the population in the narrower sense is obtained by census.20. Throughout the century, the need for more rigorous tools in areas of public health (*epidemiology*,*biostatistics*), economic and social (*such as unemployment*,*econometrics*) has been compelled to progress in statistical applications. This need was especially evident in the welfare states, which developed after the First World War and had to have deep knowledge about their population. In this sense, the “request to collect information on behalf of community management” was described by the philosopher Michel Foucault as biogeography, which was later used by many authors. The mathematical bases of statistics are based on the correspondence of Pierre Fermat and Blaise Pascal on probability theory dating back to 1654. Christiaan Huygens (1657) presented the first known scientific application of the subject. Jakob Bernoulli’s Ars Conjectandi (*posthumous*, 1713), and Abraham de Moivre’s Doctrine of Chances (1718) came to mind as a branch of mathematics.

- The theory of error is based on Roger Cotes’s opera Miscellanea (posthumous, 1722), but the first example of the application of theory to observation errors is found in a report by Thomas Simpson in 1755 (edition 1756). In accepting the axiom of repetition of this declaration in 1757 that the positive and negative mistakes are equally probable, it presents “continuity faults” and a probability curve, talking about the existence of certain definable limits that we can assume to include all the errors.

- Pierre-Simon Laplace attempted to develop a rule for observation combinations based on the principles of probability theory (1774). The error probabilities law was shown by a curve.

**Conceptual Overview**- A statistical process of studying a scientific, industrial or social prob- lem is first dealt with. This population can be the population of people in a country, the amount of crystal in the rock, or the commodities produced by a particular plant in a given period. It may instead be a process that is observed at different times; The data collected in this way is named time series.
- For practical reasons, rather than collecting data about an entire population, a subset (
*sample*) that is typically selected is studied. Data about the sample is obtained through experimentation or observation. After that, the data is statistically analyzed. There are two aims: explanation (*explanation*) and conclusion.

- Descriptive statistics can be used to summarize the sample numerically or graphically. The average and standard deviation can be shown as the basic examples for numerical displays. Graphical summaries include various types of charts and charts.

- Inferential statistics are used to model overhead overlaps, show probability, and produce conclusions about a larger statistical population. These results may be in the form of yes / no answers (
*hypothesis testing*), prediction of numerical properties (*statistical estimation*), prediction of future values (*statistical prediction*), interpretation of linear relationships between data or correlations . Other major mathematical modeling techniques are analysis of variance ANOVA, time series, and data mining.

- Here, the issue of correlation is particularly worth considering. An analysis of a data set can reveal that the two variables move together (ie, the two properties of the underlying mass are similar). For example, a study of life span with annual income can find that poor people have a shorter life span than wealthy people. Here, it can be said that there is a correlation between income and life time. But it should never be the reason for the income period or the consequence of it.

- If the sample has the ability to represent the population, the results and conclusions obtained from the sample can give information about the population as a whole. The main problem here is whether the selected sample has the ability to represent the population. The statistics provide tools that eliminate errors that occur in the sample and data collection process, and make the sample random. It also provides methods for achieving reliable experimental results.

- The basic mathematical concept that allows a raster to be understood in this way is probability. Mathematical Statistics (Statistics Theory) is an Applied Mathematics module that utilizes the theory of probability and Mathematical Analysis to examine the mathematical background of Statistical.

**Statistical methods**- Experimental and observational studies
- One of the common aims of statistical investigations is to examine causality and examine the effect of a change in predicting or independent variables, in particular, on a dependent variable. There are basically two types of statistical methods that deal with causality: experimental studies and observational studies. In both types of work, the effect of the independent variables or the differences in the variables on the observed dependent variable is examined. The difference in these types of work is the way the method is applied. Both methods can produce fruitful results.
- In the experimental method, a number of measurements are made on the system being run, the system is run on it, and a remeasurement is made to see if these effects are on the system. In the observation-based method, there is no intervention in the system, instead data are collected and the patterns of response variables (dependent variables) are searched with estimators (independent variables).
- An example of experimental work is the Hawthorne experiment, which explores the influence of enlightening employees on the Western Electric Company. In the experiment, the production at the plant was measured first, then the lighting conditions of workers working around the sliding band were changed. All the test results have shown that enlightenment increases efficiency. However, the results of this study have received serious criticism due to errors in the experimental method. For example, the control group was not used in the study.
- As an example of observation-based work, a study that examines the link between smoking and lung cancer can be shown. In this type of study, the survey method is used to gather information about areas of interest and then the information is analyzed under statistical analysis. In this example, researchers collect information from smoking and non-smoking groups and compare the number of cancer cases in both groups.

**The basic steps of an experiment:**- 1. Planning of the research, identification of sources of information, determination of the subject of the research, consideration of the moral aspects in the proposed method.
- 2. Modeling of the system, focusing on the relationship between dependent and independent variables.
- 3. To summarize a group of observers to consider their common aspects.
- 4. Explain what the numbers tell us about the world we are observing.
- 5. To document and present the results of the work.

**Measuring scales**- Statistical data are in the form of numbers and there are four types of measurement scales for these numbers. For the first time in 1946, the American statistician Stanley Stevens put forward that these data could be four different measurement scales. Stevens’ four metrics are: noun, sequential, interval, and proportional. The statistical data obtained from each different measurement scale are of varying mathematical power, and the math operations and descriptive and inferential statistical operations and analyzes that can be used for each are different.
- In the nominal scale, numbers are only given to categories that are mutually separated, and there is no mathematical property for this name / number sequence and range or origin. Only very poor statistical descriptive measures and inferential analyzes can be applied to data of this kind.

- The sequential scale also describes the rank and order of these categories, as the numbers in the figures refer to the mutually different categories. A simple arithmetic operation (
*addition, subtraction, multiplication, or division*) on different categories of numbers can give meaningless results because the magnitude difference between numbers is not important as the actual number can be changed (*ie monotonic transformation can be applied*). - On an interlaced scale, the data numbers are really numbers, and the changes between them are meaningful even for simple arithmetic operations. However, for the data values at the interval scale, the origin of the numbers (that is, the value 0) is the key. For example, the data obtained as temperature gradients are spaced. The measurement scale can be in degrees centigrade; But they can also be fahrenheit, which have different origin values.

- On a proportional scale, the differences between the different measurements are meaningful and there is also a real 0 start point for them. Kelvin grade is proportional scale if given again the example of the heat grade; Because the origin (
*-273 ° C absolute zero*) is 0 ° Kelvin; This is a real point) and it can not be heat below the heat grade. - For variables that are measured on a numeric or ordinal scale, the data are referred to as categorical variables together, and the data on an interval or proportional scale are called quantitative quantitative variables.

**Statistical research with computer**- In the second half of the 20th century, the incredible increase in the computing power and speed of computers and the widespread use of computers made a great impact on the practical application of statistical science and even the theoretical development. Because practical statistical calculations are very difficult, data analysis has always relied on linear models to focus on facilitating computation on a continuous basis. New nonlinear models (such as nonlinear regression, generalized linear models, multi-level models) have begun to be used in practice by starting to use widely used and very powerful computers and by developing numerical algorithms and developing computer software.
- Computer-based techniques such as computer revolution resampling, self-tuning methods, Gibbs sampling, permutation tests have begun to be used. On the other hand, research and practical data analysis methods based on large computer power (such as artificial neural networks or data mining) that are not dependent on advanced mathematics such as statistics have developed.
- The future of statistical science is becoming more empirical and practical after the theoretical developments of the early 20th century. This approach will undoubtedly play a major role in the addition of statistical methods to general computing software and packages (
*eg statistical parts of boxing programs*) and the availability of specially prepared statistical packages.

**Misuse of statistical data**- Incorrect use of statistical data may reveal mistakes that are notoriously difficult to identify but are very serious. These mistakes are serious because they can lead to erroneous decisions. For example, social politics, doctors and medical applications, structures such as bridges are always relied upon for the proper use of statistics for structural reliability.

- Even if the statistics are applied correctly, it is very difficult to interpret and explain the statistical conclusions of those who have little knowledge and experience on this subject. The examination of the statistical significance of a trend in the dataset (
*that is, how it can be explained by random variation in a trending sample*) can be the same as intuitive appearance of the concept of significance, but it changes many times. This means that intuitions based on intuition can lead to inappropriate decisions. They must leave the statistical ignorance of the persons and be given in their daily lives and at least have to pass statistical training and statistical literacy qualities at least to have statistical skills sufficient to deal with the information appropriately (*and to be dubious enough*).

- There is a fairly widespread perception that statistical information is misused and misused. As if this were not enough, the mistakes made many times and the feeling of wrong use were made conscious and intentional. It is known that the decision made after the wrong analysis can provide the benefit to the person who presents the statistical results. “
*There are three kinds of lies: lies, cursed lies and statistics.*” It refers to Benjamin Disraeli, a 19th century British prime minister. Where the accusation is used like a proverb. In 1909, the president of the American Harvard University, Lawrence Lovell, the statistic is like a brioche, and only if it is known by whom it is made and if the person is sure of it, then it is satisfactory also gives a little more clarity to the perception of making this intentional conscious mistake.