top of page

Simple Statistics

  • Writer: Arturo Arriaga
    Arturo Arriaga
  • Dec 1, 2021
  • 1 min read

In this app I do bit of statistical analysis from the Cars dataset, available from the MASS package in R.


A barplot is used to summarize the data in a factor column, showing the number of occurrences of each factor.


Summarizing data in a numeric column using a barplot is typically wrong and a histogram is more appropriate because it groups the data into “bins.” Use the Histogram button to get an overview of each factor columns.


Barplot

Histogram



Invented by the great statistician John Tukey (1915-2000), a boxplot displays numeric data visually. In the center, a bold line shows the median; the bottom and top of the box are the first and third quartiles; the “fences” can be as far as 1.5 times away from the height of the box, but they must always coincide with observed values. Beyond the fences are shown any outliers.


Boxplot



Use the boxplot button to get an idea of how a numeric variable depends on a factor variable. It is often readily apparent that the difference in medians is significant, but in other cases, the difference looks too small to be meaningful.


To compare two factor variables, use a contingency table. Choose two factor variables and click the Table button.





To compare two numeric variables, use a scatter plot, plotting one variable horizontally, the other vertically. If you use the Regression Line button, you will add to the scatter plot a line that is the best fit to a linear relationship between the numeric variables.


A scatter plot comparing the horsepower of a vehicle to its price.

A scatter plot with a regression line comparing the MPG Avg. of a vehicle to the Min.Price of vehicles.



 
 
 

Comments


Let's connect on LinkedIn

bottom of page