首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Machine learning for financial transaction classification across companies using character-level word embeddings of text fields
Authors:Rasmus Kær Jørgensen  Christian Igel
Institution:1. PricewaterhouseCoopers (PwC), Strandvejen 44, Hellerup, DK-2900 Denmark;2. Department of Computer Science, University of Copenhagen, Universitetsparken 1, Copenhagen Ø, DK-2100 Denmark
Abstract:An important initial step in accounting is mapping financial transfers to the corresponding accounts. We devised machine-learning-based systems that automate this process. They use word embeddings with character-level features to process transaction texts. When considering 473 companies independently, our approach achieved an average top-1 accuracy of 80.50%, outperforming baselines that exclude the transaction texts or rely on a lexical bag-of-words text representation. We extended the approach to generalizes across companies and even across different corporate sectors. After standardization of the account structures and careful feature engineering, a single classifier trained on 44 companies from 28 sectors achieved a test accuracy of more than 80%. When trained on 43 companies and tested on the remaining one, the system achieved an average performance of 64.62%. This rate increased to nearly 70% when considering only the largest sector.
Keywords:accounting  finance  financial transactions  multiclass classification  random forest  word embedding
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号