Multi-label Patent Classification Based on Text and Historical Data
Abstrak
Patent classification,which is used to assign multiple international patent classification(IPC) codes to a given paten,is a very important task int the field of patent data mining.In recent years,many studies on this task focus on mining patent text to predict the first or second level codes for IPC.In real scenarios,a patent often has multiple IPC codes which is a multi-label classification task.Apart from the texts,each patent has a corresponding assignee and the assignee's historical patent application behavior may have a certain business tendency.The preference representation of this behavior can effectively improve the precision of patent classification.However,previous methods fail to make full use of patent historical data.A classification model is proposed for patent automatic classification.Main processing of this model is as follows:firstly,initialize the patent text representation with BERT pretraining language model,then use Text-CNN model to capture local features and take the output as the final patent text representation;secondly,Bi-LSTM is used to learn the preference representation by aggregating historical patent texts and labels through dual channels;finally,we fuse the texts and assignee's sequential preferences for prediction.Experiments on real data set and comparisons with different baselines show that the proposed patent classification algorithm based on patent text and historical data has a great improvement in precision.
Topik & Kata Kunci
Penulis (1)
XU Xuejie, WANG Baohui
Akses Cepat
- Tahun Terbit
- 2024
- Sumber Database
- DOAJ
- DOI
- 10.11896/jsjkx.230200199
- Akses
- Open Access ✓