Header Ads

Disease Prediction using machine learning



Abstract :

The rapid proliferation of Internet technology and handled devices has opened up new avenues for an online healthcare system. There are instances where online medical help or healthcare advice is easier and faster to grasp than real-world help. People often feel reluctant to go to hospital or physician or minor symptoms. However, in many cases, these minor symptoms may trigger major health hazards. As online health advice is easily reachable, it can be a great head start for users. Moreover, existing online health care systems suffer from a lack of reliability and accuracy. This system analyzes the symptoms provided by the user as input and gives the disease as an output. Prediction is done by implementing the Naive Bayes Classifier.  

Problem statement :

The classical diagnosis method is a process where the patient has to visit a doctor, undergo various medical tests, and then come to a conclusion. This process is very time-consuming. To save time required for the initial process of diagnosis symptoms, this project proposes an automated disease prediction system that relies on user input. The system takes input from the user and provides a list of probable diseases.

Block diagram :     

                Block diagram for disease prediction using AI

Module description :

The system will predict the disease where the symptoms are given as the input. The disease will be predicted using the Naive Bayesian algorithm. According to the literature survey, this algorithm results in the maximum accuracy for a larger dataset. The dataset contains disease as labels and for each disease, symptoms are given. 70% of the dataset will be used as training and 30% will be used for training data. Training and testing would be done on the dataset and the desired output will be obtained.

Naive Bayes algorithm :

This system accepts the input from the user and predicts the most probable disease. This is achieved with the help of the dataset and the machine learning algorithm. The algorithm here is Naive Bayesian which works on a probabilistic approach. We have imported Scikit to learn the library for its implementation. For this, we have used multinomial NB since multiple variants i.e. multiple symptoms are taken.

Example:

Let's take a example of phone. If a phone contains features like touch screen, internet facility, good camera, etc. These features are of a smart phone. So, we can classify this as a smart phone.

Bayesian Theorem :

  • The purpose of the Bayesian theorem is to predict the class label i.e. disease in our project for a given tuple.
  • Let X be a tuple containing symptoms and H be some hypothesis, such as that the data tuple X (symptoms) belongs to a specified class C (disease)
  • For classification problems, we are looking for the probability that tuple X belongs to class C, given that we know the attribute description of X.    

Dataset :

Dataset for disease prediction

The dataset was taken from a study conducted at Colombia University. It consists of 150 diseases and each disease consist of an average of 8-10 symptoms. 70% of the dataset used for training was made considering all combinational inputs. The symptoms present for the corresponding disease were marked as 1 and remaining as 0. 

It consists of 5 drop-down options where we have passed a list of symptoms. The user can select any five symptoms and clicking the predict button the disease predicted will be displayed in the text-box. 

GUI : 

We have used the Tkinter package for the User interface. Tkinter is the standard GUI library for python. Python, when combined with Tkinter, provides a fast and easy way to create a GUI application. Tkinter provides a powerful object-oriented interface to the Tk GUI toolkit.

The developed GUI : 

Developed GUI before disease prediction

 
Developed GUI after disease prediction


Working:

1. Import all the necessary packages i.e. Tkinter for GUI, numpy to perform numerical operations and pandas for reading the CSV file.

2. Create a list which contains all symptoms according to CSV file.

3. Create another list (L1) which contains the diseases.

4. Then, create a empty list (L2).

5. L1 and L2 both will be having equal length

6. L1 contains all diseases and L2 contains 0.

7.  To perform testing, we will read the CSV files using pandas and replace the index. The index where symptom is present is replaced by 1 and rest are kept as it is i.e. 0

8. Consider this as a graph where X is symptoms and Y is disease

9. When user enters symptoms L2 is updated accordingly and matched with dataset.

10. Thus, the best match output is predicted.

Result and conclusion :

The project is designed in such a way that the system takes symptoms from the user as input and produces output i.e. predict disease. The user can select a minimum of one to a maximum of five symptoms. Less accuracy will be attained if only one symptom is entered. More the number of symptoms, the greater is the accuracy.

Github link :


Ppt link :


Youtube link :




4 comments:

  1. How can I contact with you? I have some doubts regarding this Di.

    ReplyDelete
  2. Thank you. this helped me a lot

    ReplyDelete
  3. Can You Please provide us the project report.

    ReplyDelete

Powered by Blogger.