Giới Thiệu · DSTK Data Science Toolkit
DSTK - Data Science Toolkit 3 is a set of data and text mining software, following the CRISP DM model. DSTK offers data understanding using statistical and text analysis, data preparation using normalization and text processing, modeling and evaluation for machine learning and statistical learning algorithms.DSTK 3 will offer attractive features like Deep Neural Network (Deep Learning), Text Link Analysis with Visualizations, KMeans Clustering. Some of these features may be presented in older version, but because the algorithms are rewritten to reduce the use of external libraries like Weka to reduce file size, we need more time to develop them. DSTK Engine is still in beta stage, hence, there may be some bugs and inaccuracy.
DSTK 3 consists of DSTK Engine, DSTK ScriptWriter, DSTK Studio and DSTK Text Explorer. DSTK Engine is R simplified, focusing on Data Mining. DSTK ScriptWriter offers GUI to write script for DSTK Engine. DSTK Studio offers SPSS Statistics like GUI for data mining, and DSTK Text Explorer offers GUI for Text Mining.