Back to Blog
·Honza Tyl·1 min read·Archive 2018

A New Challenge!

Recently, an interesting competition took place on Kaggle (https://www.kaggle.com/c/jigsaw-toxic-comment-classificatio…) to create a detector capable of recognising insults, toxic and obscene remarks, and so forth – the Toxic Comment Classification Challenge….

A New Challenge!

Recently, an interesting competition took place on Kaggle (https://www.kaggle.com/c/jigsaw-toxic-comment-classificatio…) to create a detector capable of recognising insults, toxic and obscene remarks, and so forth – the Toxic Comment Classification Challenge.

I found out about it late, but I still managed to write a deep neural network based on LSTM + FastText (the algorithm's performance would have earned a gold medal in the Kaggle rankings). A colleague from Alpha Industries translated the training dataset into Czech (70 megabytes of text!) and deployed it on an Amazon server, and you can now try it out here: www.detector.alphai.cz.

The algorithm is not perfect; however, it works reasonably well in both Czech and English.

Here's a task for you – can you find a sentence, or even a longer text, that the algorithm evaluates as non-vulgar (non-toxic), yet is actually offensive?

Původní zdroj: wordpress

Související články