This is my first blog post. In this blog site, I would like to share my understanding of some data analytics method and will also share some analysis done on publicly available data.

Along the way I would like to share my experience of taking the analytics related courses at coursera.org & edx.org.

Hope you will enjoy !

Today I start by sharing my understanding of a concept called bootstrapping. 

Bootstrapping is basically used to estimate the important parameters of population when only a small sample of the population is available.

Example - Suppose we have a sample of 1000 nos. from the entire population of the nos. We don't know the characteristics of the population. However using these 1000 nos. we try to estimate the characteristics of the population. 

I will use R language to illustrate the concept.

At first I generate 1000 random numbers from uniform distribution (min=0,max=100) using the code

x<-runif(1000,0,100)

The histogram of the values can be plotted by using:-

hist(x)

The means of the 1000 random numbers:-

> mean(x)
[1] 48.36093

As evident, the numbers are uniformly distributed between 0 & 100.

Further, we initialize a matrix where we will be storing the samples. We call it sams

sams<-matrix(rep(0,20),nrow=20,ncol=50)

So we will sample 20 numbers from the 1000 available numbers and repeat the process 50 times using the below code.

 for(i in 1:20){sams[i,]<-sample(x,size = 50,replace=TRUE)}

The histogram of the means of the samples(50 of them) is shown below. As per the central limit theorem, it approaches a normal distribution.



The means of each column (each sample) is as below:-

> colMeans(sams)

 [1] 50.47116 45.97948 40.63419 44.79656 50.15254 39.42238 55.78218 44.02374 41.79187
[10] 55.73118 52.31857 55.29556 46.47345 46.73839 45.71264 46.18665 41.53983 45.33508
[19] 57.83905 55.19864 53.28645 54.10386 47.95268 47.29992 42.41796 40.76468 50.70600
[28] 50.63717 37.12649 49.82388 50.32402 55.91497 51.97249 43.38939 52.34219 42.28717
[37] 47.38942 35.62460 45.83817 46.96191 50.60833 52.40720 47.89722 49.47754 55.74592
[46] 56.92109 59.75697 48.75666 45.07243 49.81073

This illustrates the concept of bootstrapping.



Comments

Popular posts from this blog

Monte Carlo Simulation in Excel

MRF story...

Using DataWrapper