In this article we are going to participate in a challenge of Data Science proposed by Kaggle. This challenge consists in analizing data of passengers from Titanic and build predictions about their fate on the tragic night of the accident.
One of the requirements is the R environment to be installed in your machine. I suggest you download RStudio, wich is an IDE that greatly speeds things up and helps a lot, specially with variables and graphics. Another choice is to download only the basic R environment (mirrors found here: http://cran.r-project.org/mirrors.html).
There is another pre-requisite we must meet before begin analizing the data: the data itself. Each Kaggle challenge has its own page, ours is www.kaggle.com/c/titanic-gettingStarted. Click on 'Data' and download the files 'train.csv' and 'test.csv'. You'll probably be asked to login before download starts.
The file 'train.csv' will be used as as training set. Random trees are supervisioned learning algorithms, i.e. they need to be fed the expected output along with the input data, so we'll be using the column 'Survived' inside that file as the output when training our model. After our prediciton model is done 'test.csv' will be used to test it.
Now roll up your sleeves and let's get to work!Read more...