Back to Blog
·Eva Popílková·2 min read·Archive 2019

The Third Year of the Data Science Olympics!

So, what do you think of the latest Olympics? In case anyone missed it, at the end of May, the third year of the Data Science Olympics took place...

The Third Year of the Data Science Olympics!

So, what do you think of the latest Olympics? In case anyone missed it, at the end of May, the third year of the Data Science Olympics took place! It is the largest machine learning competition in Europe, with over 1,000 scientists participating simultaneously in Paris and Berlin. All the "athletes" receive a task at the same moment and have two hours to come up with the most accurate predictive model. So, it's something like Kaggle, but it lasts only two hours instead of three months.

I don’t know about you, but I enjoy learning from the best, and I devoured the insights of the winner (you can find the full article in the links). In short, my favourite algorithm, LightGBM, won, fed with modified categorical variables (label encoding + value count + target encoding). The challenge was to realise that the loss function needed to be optimised (errors in the third category are penalised much more heavily than errors in the first). Genius!

Interestingly, champion Romain Ayres also tried other algorithms in a short time that I would have expected to perform well.
The random forest was too weak against LightGBM. Neural networks were too slow for the champion – he wouldn’t have had time to optimise the architecture and had no GPUs at his disposal (this is where he might have shone). Surprisingly, the author’s model, composed of several LightGBMs (with different seed variations), did not work.

An interesting fact is that when I tried the champion's code, he used the old LabelEncoder() library, which in the latest version does not handle missing values. Clearly, the master was using outdated versions of the libraries. Romain also wore headphones with music throughout the competition, so he didn’t notice that halfway through, the organisers added more data that could have refined the results.

Sources:
Olympics: https://www.datascience-olympics.com/
Four Tricks: https://medium.com/…/four-machine-learning-tricks-you-shoul…
How I Won the Olympics: https://medium.com/…/how-i-won-the-data-science-olympics-20…
Winner: https://medium.com/@romain.ayres

Původní zdroj: wordpress

Související články