吕巨建,赵慧民,陈荣军,李键红.基于自适应稀疏邻域重构的无监督主动学习算法[J].计算机科学,2018,45(6):251-258
基于自适应稀疏邻域重构的无监督主动学习算法
Unsupervised Active Learning Based on Adaptive Sparse Neighbors Reconstruction
投稿时间:2017-01-11  修订日期:2017-03-18
DOI:10.11896/j.issn.1002-137X.2018.06.045
中文关键词:  主动学习,稀疏重构,优化实验设计,直推式实验设计,局部线性重构
英文关键词:Active learning,Sparse reconstruction,Optimal experimental design,Transductive experimental design,Local linear reconstruction
基金项目:本文受国家自然科学基金(61672008),广东省自然科学基金重点项目(2016A030311013),广东省普通高校国际合作重大项目(2015KGJHZ021),广东省自然科学基金(2016A030310335)资助
作者单位E-mail
吕巨建 广东技术师范学院 广州510665
广州数字内容处理及其安全性技术重点实验室 广州510665 
jujianlv@163.com 
赵慧民 广东技术师范学院 广州510665
广州数字内容处理及其安全性技术重点实验室 广州510665 
 
陈荣军 广东技术师范学院 广州510665  
李键红 广东外语外贸大学语言工程与计算实验室 广州510006  
摘要点击次数: 198
全文下载次数: 141
中文摘要:
      在很多信息处理任务中,人们容易获得大量的无标签样本,但对样本进行标注是非常费时和费力的。作为机器学习领域中一种重要的学习方法,主动学习通过选择最有信息量的样本进行标注,减少了人工标注的代价。然而,现有的大多数主动学习算法都是基于分类器的监督学习方法,这类算法并不适用于无任何标签信息的样本选择。针对这个问题,借鉴最优实验设计的算法思想,结合自适应稀疏邻域重构理论,提出基于自适应稀疏邻域重构的主动学习算法。该算法可以根据数据集各区域的不同分布自适应地选择邻域规模,同步完成邻域点的搜寻和重构系数的计算,能在无任何标签信息的情况下较好地选择最能代表样本集分布结构的样本。基于人工合成数据集和真实数据集的实验表明,在同等标注代价下,基于自适应稀疏邻域重构的主动学习算法在分类精度和鲁棒性上具有较高的性能。
英文摘要:
      In many information processing tasks,individuals are easy to get a lot of unlabeled data,but labeling the unlabeled data is quite time-consuming and usually expensive.As an important learning method in the field of machine lear-ning,active learning reduces the cost of labeling data by selecting the most information data points to label.However,most of the existing active learning algorithms are supervised method based on the classifier,not suitable for the sample selection problem without any label information.Aiming at this problem,a novel unsupervised active learning algorithm was proposed,called active learning based on adaptive sparse neighbors reconstruction,by learning from the optimal experiment design and combining the adaptive sparse neighbors reconstruction.The proposed algorithm adaptively selects the neighborhood scale according to different regional distribution of dataset,searches the sparse neighbors and calculates the reconstruct coefficients simultaneously,and can choose the most representative data points of the distribution structure of dataset without any label information.Empirical results on both synthetic and real-world data sets show that the proposed algorithm has high performance in classification accuracy and robustness under the same labeling cost.
查看全文  查看/发表评论  下载PDF阅读器