Look at the table below and then look at the graph. I don’t need to tell you which mode delivers the information more efficiently.
The main purpose of a data display is to organise and display data to make your point clearly, effectively, and correctly, and graphs often give you more of a feel for a variable and its distribution than looking at the raw data or a frequency table.
Scottish engineer and political economist William Playfair invented four types of diagrams: line graph, bar chart, pie chart, and circle graph. He is considered the founder of graphical methods of statistics.
In this section, we’ll learn about charts and graphs. We will study graphs for categorical variables, time charts and then graphs for quantitative variables.
Graphs for categorical variables
The two primary graphical displays for summarising a categorical variable arethe pie chart and the bar graph.
A pie chart is a circle having a slice of the pie for each category. It takes categorical data. The sum of all the slices of the pie should be 100% or close to it. For example, the above image shows the percentage of votes polled out of total votes counted till 18 may 2009. Here we can easily interpret that other state parties got 36.42 % of votes and INC got 28.55% votes followed by BJP with 18.80% votes and so on.
If we look at percentages it will add up to 100% or close to it. The slices of the pie called ‘other’ shows a lack of detail in the information gathered. It would be a good practice to ask the size of the data as the pie chart only shows the percentage in each group, not the number in each group.
The Bar Graph
Bar Graph is also used to summarise categorical data. A bar graph displays a vertical bar for each category. Bar graph also breaks categorical data down by group, showing how many individuals lie in each group (A bar graph with categories ordered by their frequency is called a Pareto chart, named after Italian economist Vilfredo Pareto (1848–1923), who advocated its use.), or what percentage lies in each group.
For example you can see in the above image the sales data of bikes sold in each month. 1000 bikes were sold in May followed by 900 bikes sold in the months of June and October. Instead of numbers we can also show the percentage of bikes sold in each month.
While evaluating the bar graph it is necessary to check the unit of Y-axis, make sure that they are evenly spaced. It is also wise to ask for the total number of data used to summarise the bar graph while using percentage instead of number to show counts.
Look at the above time chart. It shows the revenue of the company over the period of 5 years. At each period of time the amount is shown as dots and dots are connected by line. In the time chart on the X-axis there is time (months, years, hours, days etc) and on the Y-axis there is quantity to be measured over a period of time.
Sometimes time charts may be misleading. For example if we count the number of crimes being committed in some city each year. It will appear to be increasing. But instead of counting the number of crimes, if we look for the crime rate which is adjusted to increasing population then we will find it to be decreasing. So it is important to understand what statistics are being presented and examine them for fairness and appropriateness.
Graphs For Quantitative Variables
In this part we will see how to summarise quantitative variables graphically and visualise their distribution. We will go through Histogram and Box Plot.
Histogram is a more versatile way to graph the data and picture the distribution. It uses bars to show the frequencies or the relative frequencies of the possible outcomes for a quantitative variable. It is basically a bar graph which applies to numerical data.
Consider the above Histogram, it shows the distribution of weight of students. To be sure each number falls into exactly one group, the bars on a histogram touch each other but don’t overlap. On X-axis each bar is marked by a value representing its beginning and endpoints. The height of the bar represents either the frequency of each group or the relative frequency of each group.
In the above histogram it can be seen that the most common outcome lies between 120 to 130 pounds. This is the frequently occurring outcome. In histogram, selecting the interval is the crucial part. If you use too few intervals, the graph will be too crude. It may contain mainly tall bars. On the other hand if you use too many intervals we may get a graph with irregularities, with many very short bars. We can lose information about the shape of the distribution.
A boxplot is a one-dimensional graph of numerical data based on the five-number summary of positions, which includes the minimum value, the 25th percentile (known as Q 1 ), the median, the 75th percentile (Q 3 ), and the maximum value. In essence, these five descriptive statistics divide the data set into four equal parts.
A line inside the box marks the median. The lines extending from each side of the box are called whiskers.
Box plots are useful for identifying potential outliers.
In this article we got to know about various types of graphs and charts used to visualise the data. Graphs are time saving tools when we are dealing with big data. When we want to explain the maximum amount of information in a short time we simply present the data in graphical format. Each graph and charts are used according to the need.
Do follow our LinkedIn page for updates: [ Myraah IO on LinkedIn ]