# [Statistics 101] (1) Overview

Social sciences and economics are dealing with a huge number of people and track statistical trends rather than exact numbers. Even modern physics is indebted to statistics in its analysis.

I will present the basic concepts and formulas in this “Statistics 101” series.

## Basic Terminologies

Variables

• Entities that can have more than one value ( <-> constant)
• Independent Variables / Dependent Variables

[Note]
For example, a new treatment has been developed for AIDS. When the new treatment is tested, a sample of patients will be assigned to one of two groups: a new method vs. a traditional method. In this case, the independent variable is the “method of treatment,” and the dependent variable is the recovery rate of each method.

In general, researchers look at how dependent variables will vary when they change the independent variables.

Data

• Data is raw material.
• Numerical Data: can be expressed as a number; quantitative data or measurement data
• Nominal Data: A number does not have any specific meaning, e.g., athlete’s jersey
• Ordinal Data: The order of numbers has a specific meaning.
• Interval Data: provide equal differences.
• Ratio Data: include a meaningful zero point (usually means total absence of underlying property), and each number can be understood as a ratio.
• Categorical Data: qualitative data (such as opinions)

[Note] Examples of numerical data

The decibel (dB) is a measure of sound level. 20 dB is louder than 10 dB. Therefore the decibel is ordinal data (not nominal data). But the 20 dB is not twice louder than 10dB. (20 dB is ten times louder than 10 dB and 30 dB is a hundred times louder than 10 dB.) Therefore, the decibel is not interval data.

Fahrenheit (F) is a unit of temperature. The difference between 100F and 110F is the same as the difference between 120F and 130F. But 100F is not twice hotter than 50F, and 0F does not mean the absolute lack of temperature. Fahrenheit is ordinal and interval data but not ratio data.

Meter (m) is a unit of length, and 0m means a complete absence of length. Also, 10m is twice as long as 5m. Meter is ordinal, interval, and ratio data.

Population: A group of individuals that are wished to be studied.

Sample: Selected individuals in the population, who are actually being studied

Data Set: the collection of all data taken from the sample

Outliners: very large or very small values in the data set that are not typical

Bias: A systematic favoritism that is present in the data collection and analysis process, resulting in misleading results. It might be from how samples are selected or how data are collected.