To make a submission, you need to become a collaborator with the Data Challenge.
Store your predictions in a csv
file.
The following R
code illustrates how to do this:
require(DataChallenge.2017)
data(train)
# set the file name of your prediction
# please replace TEAM with your team name
# and SUBMISSION with your submission ID
outfile.base <- 'DataChallenge_TEAM_SUBMISSION_'
# extract the countries and years for which you need to make a prediction
# and make your prediction
sub <- subset(train, YEAR>=2015)
sub[, PREDICTION:= as.integer(runif(nrow(sub),0,1)>0.5)]
# submissions must have the following format: 5 columns with names
# ISO (country code as in sub above)
# YEAR (2015 and 2016 as in sub above)
# TEAM_ID (allocated to you)
# SUBMISSION_ID (unique to this submission)
# PREDICTION (real numbers are allowed).
# You need to make a prediction for each country and each year in `subset(train, YEAR>=2015)`.
sub <- unique(subset(sub, select=c(ISO,YEAR,PREDICTION)))
sub[, TEAM_ID:='olli0601']
sub[, SUBMISSION_ID:='random']
# check your submission, this must evaluate to TRUE.
dc.check.submission(sub)
# write submission to file
write.csv(sub, row.names=FALSE, file=paste0(outfile.base,'predictions.csv'))
Next, you need to upload your submission. This is easy once you accepted the github invitation to collaborate on the Data Challenge.
** First rule for today: submissions that do not pass dc.check.submission
will be ignored. **
** Second rule for today: your team can make at most 30 submissions. **
We will calculate the mean squared error between all of your submissions and the true number of epidemic outbreaks in 2015 and 2016 across all countries.
Submissions that do not pass the dc.check.submission
will be ignored. If more than 30 entries are submitted by your team, only the first 30 will be considered.
Beating our baseline prediction is not difficult, do not worry. We hope you enjoy working with real & messy data today, which is a bit different to what you typically see in your courses. =J