基于RF特征选择和XGBoost模型的赤潮等级预测

    Red tide level prediction based on RF feature selection and XGBoost model

    • 摘要: 赤潮的发生是各种自然因素综合作用的结果,涉及物理、化学、生物等方面的因素。针对赤潮预测中影响因子选取难、准确率不够等问题,本文提出一种基于随机森林(RF)特征选择方法的极端梯度提升树(XGBoost)赤潮等级预测模型。以三沙湾赤潮监控区为研究区,将2005—2019年间湾区内发生的赤潮事件数据作为模型输入数据,结合随机森林算法的特征重要性和皮尔逊相关性分析得出特征最终排序。其次,根据RF算法在各特征下的AUC值求得模型最佳特征数并结合特征重要性选出XGBoost模型所需的最佳特征集合。最后,利用最佳特征集合对XGBoost分类模型进行训练。实验结果表明,该方法相比其他分类方法可以达到较高的分类精度,能够为三沙湾赤潮等级预测提供新的解决方法。

       

      Abstract: The occurrence of red tide is the result of various natural factors,involving physical,chemical,biological and other factors.Aiming at the difficulty in selecting impact factors and low accuracy in red tide prediction,this study proposed an extreme gradient boosting tree(XGBoost)red tide level prediction model based on random forest(RF)feature selection method.We used Sansha Bay as a research area,taking the annual red tide event data from 2005 to 2019 as the model input data,combining the importance of random forest algorithm features and Pearson correlation analysis to obtain the final ranking of features.Secondly,according to the AUC value of each feature of the RF algorithm,we obtained the optimal feature number of the model,and then selected the optimal feature set required by the XGBoost model based on the importance of the feature.Finally,we used the best feature set to train the XGBoost classification model.Experimental results showed that this method can achieve higher classification accuracy than other classification methods,which can provide a new solution for the prediction of red tide levels in Sansha Bay.

       

    /

    返回文章
    返回