Friday 23 October 2015

Disease Inference from Health-Related Questions via Sparse Deep Learning

Abstract
Automatic disease inference is of importance to bridge the gap between what online health seekers with unusual symptoms need and what busy human doctors with biased expertise can offer. However, accurately and efficiently inferring diseases is non-trivial, especially for community-based health services due to the vocabulary gap, incomplete information, correlated medical concepts, and limited high quality training samples. In this paper, we first report a user study on the information needs of health seekers in terms of questions and then select those that ask for possible diseases of their manifested symptoms for further analytic. We next propose a novel deep learning scheme to infer the possible diseases given the questions of health seekers. The proposed scheme comprises of two key components. The first globally mines the discriminant medical signatures from raw features. The second deems the raw features and their signatures as input nodes in one layer and hidden nodes in the subsequent layer, respectively. Meanwhile, it learns the inter-relations between these two layers via pre-training with pseudo labeled data. Following that, the hidden nodes serve as raw features for the more abstract signature mining. With incremental and alternative repeating of these two components, our scheme builds a sparsely connected deep architecture with three hidden layers. Overall, it well fits specific tasks with fine-tuning. Extensive experiments on a real-world dataset labeled by online doctors show the significant performance gains of our scheme.
Aim
The main aim to build a disease inference scheme that is able to automatically infer the possible diseases of the given questions in community-based health services.
Scope
The scope is to report a user study on the information needs of health seekers and to propose a novel deep learning scheme to infer the possible diseases given the questions of health seekers.
Existing System
The greying of society, escalating costs of healthcare and burgeoning computer technologies are together driving more consumers to spend longer time online to explore health information. One survey shows that 59% of U.S. adults have explored the internet as a diagnostic tool in 2012. Another survey reports that the average U.S. consumer spends close to 52 hours annually online to find wellness knowledge, while only visits the doctors three times per year in 2013. These findings have heightened the importance of online health resources as springboards to facilitate patient-doctor communication. The current prevailing online health resources can be roughly categorized into two categories. One is the reputable portals run by official sectors, renowned organizations, or other professional health providers. They are disseminating up-to-date health information by releasing the most accurate, well-structured, and formally presented health knowledge on various topics. WebMD1 and MedlinePlus2 are the typical examples. The other category is the community-based health services, such as HealthTap3 and HaoDF4. They offer interactive platforms, where health seekers can anonymously ask health-oriented questions while doctors provide the knowledgeable and trustworthy answers.
Disadvantages
However, the community-based health services have several intrinsic limitations.
·      First of all, it is very time consuming for health seekers to get their posted questions resolved. The time could vary from hours to days.

·      Second, doctors are having to cope with an ever-expanding workload, which leads to decreased enthusiasm and efficiency.

·      Third, qualitative replies are conditioned on doctors’ expertise, experiences and time, which may result in diagnosis conflicts among multiple doctors and low disease coverage of individual doctor.
Proposed System
This project aims to build a disease inference scheme that is able to automatically infer the possible diseases of the given questions in community-based health services. We first analyze and categorize the information needs of health seekers. Our scheme builds a novel deep learning model, comprising two components. The first globally mines the latent medical signatures. They are compact patterns of inter-dependent medical terminologies or raw features, which can infer the incomplete information. The raw features and signatures respectively serve as input nodes in one layer and hidden nodes in the subsequent layer. The second learns the interrelations between these two layers via pre-training. Following that, the hidden nodes are viewed as raw features for more abstract signature mining. With incremental and alternative repeating of these two components, our scheme builds a sparsely connected deep learning architecture with three hidden layers. This model is generalizable and scalable. Fine-tuning with a small set of labeled disease samples fits our model to specific disease inference. Different from conventional deep learning algorithms, the number of hidden nodes in each layer of our model is automatically determined and the connections between two adjacent layers are sparse, which make it faster.
Advantages
·      This project benefits from the volume of unstructured community generated data and it is capable of handling various kinds of diseases effectively.

·      It investigates and categorizes the information needs of health seekers in the community-based health services and mines the signatures of their generated data.

·      Connected deep learning scheme that is able to infer the possible diseases given the questions of health seekers.

·      It permits unsupervised feature learning from other wide range of disease types. Therefore, it is generalizable and scalable.

System Specifications

Hardware Requirements
  • Speed                  -    1.1 Ghz
  • Processor              -    Pentium IV
  • RAM                    -    512 MB (min)
  • Hard Disk            -    40 GB
  • Key Board                    -    Standard Windows Keyboard
  • Mouse                  -    Two or Three Button Mouse
  • Monitor                -     LCD/LED
 Software requirements
  • Operating System              : Windows 7             
  •  Front End                           : ASP.Net and C#
  • Database                             : MSSQL
  • Tool                                    : Microsoft Visual studio 
References
Nie, L.Wang, M. ; Zhang, L. ; Yan, S. "DISEASE INFERENCE FROM HEALTH-RELATED QUESTIONS VIA SPARSE DEEP LEARNING ", IEEE Transactions on  Knowledge and Data Engineering Volume:27 , Issue: 8 , February 2015


No comments:

Post a Comment