Language Detection identifies the dominant language of a text. The detection is based in the franc library and uses N-grams. This means that the longer the text, the better the detection will be.
These are the settings for this recipe:
The output dataset will have two new columns called "language_code", with the language code according to the ISO639-1 (i.e. "en"), and "language_name", with the name of the language (i.e. "English"). The following example uses the dataset used in this tutorial. The recipe is configured to obtain the language code.