The ABCs of Statistics

Statistics is the part of mathematics that seeks to collect, describe, and analyze numbers. In virtually all parts of society, statistics is central to helping us see the big picture and make the right decisions.

The word statistics comes from the word “status,” which is the Latin term for “condition.” In the  beginning, statistics was used to create an overview of population growth during the Roman  expansion. As time went on, the field of statistics underwent a dramatic development, and has since become a sophisticated science that has spread to all corners of society. The essence of statistics has remained unchanged. The need to convert large volumes of data into knowledge and clarify the relationships between them is at least as timely today as it was when Romans were being counted.

Basically, we can divide the field of statistics in two areas: descriptive and inductive statistics. Just as words can describe the characteristics of a face, descriptive statistics can be used to describe the characteristics of data.

Inductive statistics attempts to generalize an entire population based on a sample. Inductive statistics is used when you make a random selection of, for example, 100 Danes, and from that point, try to say something about the entire population of Danes.

Fig 1.1

In statistics, we need to have data to work with. Data can either come from a population (N) or a sample (n).

A population can be defined however you want it to be. Depending on the purpose of your analysis, a population could be anything from a country’s population to the number of tires you have in stock. The population must be seen as the total number of possible items—for example, all Danes or all tires in stock.

A sample is a number of items drawn from the given population. Gathering data for an entire population is both time and resource-intensive, so you almost always use a random sample. The  purpose of a sample is to create a mini-population which can then be used to describe trends or characteristics for the entire population. This is what characterizes inductive statistics.

Before an election in Denmark, the media periodically uses exit poll samples to see which party will get the most votes. In this context, the population (N) equals 3 million voting Danes. The sample
(n) is composed of randomly selected individuals from the population and represents just a small part of the overall population.

Terms like population and sample are the building blocks of statistics. So, let us take a moment to define them. Terms such as mean and standard deviation are used for both populations and samples, but indicated by different symbols.

Calculations made from population data are called population parameters and mainly use Greek letters. Calculations made from a sample are called point estimates and use letters from our own
alphabet.

The most common terms are shown in the table below. Their importance is discussed in chapter 2. If you are slightly confused by the new terms, just remember that the essence of statistics is describing an entire population based on a sample.

Table 1.2
Formulas for these calculations can be found in the appendix to chapter 2.