Gravatar

Henrique Lampert

Soon to be a computer engineer graduate. Likes tecnology and loves programming.

Read all articles published by Henrique Lampert.



Data engineering over the Titanic disaster
Gravatar published this on

Data Science Tutorials

In this article we are going to participate in a challenge of Data Science proposed by Kaggle. This challenge consists in analizing data of passengers from Titanic and build predictions about their fate on the tragic night of the accident.

One of the requirements is the environment to be installed in your machine. I suggest you download RStudio, wich is an IDE that greatly speeds things up and helps a lot, specially with variables and graphics. Another choice is to download only the basic environment (mirrors found here: http://cran.r-project.org/mirrors.html).

There is another pre-requisite we must meet before begin analizing the data: the data itself. Each Kaggle challenge has its own page, ours is www.kaggle.com/c/titanic-gettingStarted. Click on 'Data' and download the files 'train.csv' and 'test.csv'. You'll probably be asked to login before download starts.

The file 'train.csv' will be used as as training set. Random trees are supervisioned learning algorithms, i.e. they need to be fed the expected output along with the input data, so we'll be using the column 'Survived' inside that file as the output when training our model. After our prediciton model is done 'test.csv' will be used to test it.

Now roll up your sleeves and let's get to work!

Read more...










Read more about: