Conducted a survey on Radish consumers and recorded their votes to their favorite radish variety in a txt file and now our data is ready for analysis.
Is to answer two questions:-
- To find most popular radish variety.
- To find whether anybody voted twice (rigging!)
iPython Notebook version 3.4.3
Step 1: Reading txt file into iPython Notebook using f.open() function.
Step 2: Breaking each line (string) into two parts one is name of person who voted and the second is the name of radish variety using string functions like strip() and split().
Step 3: Now we will create a generic function to count how many people voted for each variety. We shall use if condition along with string functions to achieve our objective.
Step 4: We shall count all the votes by creating a dictionary, this is one of the most powerful tool in Python language.This makes our life easy by checking all the names of radish varieties and counting how many of the customers voted for each unique name in the dictionary.
Step 5: The output after the above step is not the desired one so we have to clean the output. We do that by using some of the string functions like capitalize() and strip(). This is mainly to clean typo errors in names for example ” Aditya”, “aditya”, “A ditya”. Computer understands all of them differently. So clean this we will be using the functions stated above.
Step 6: The last but before cleaning step is to check fraudsters who voted twice using by creating a separated dictionary and checking each instance of vote with the dictionary and printing the names who voted twice as fraud and not counting the same fake votes.
Step 7: Finally, we will create three functions: one for cleaning names of persons who voted, names of radish varieties, two for checking whether anyone voted twice, third for counting votes of each radish variety.
Final Step: Time to announce the winner after our analysis on the text file, we will use basic for and if loops to decide the winner. Here is the code snippet for same.
This completes our analysis on the most consumed radish variety by the sample consumers. Out of total 11 varieties “The Champion” evolved as winner. By this we answered our two questions which were the objective of this analysis.
Let me know your thoughts about the code snippets used here.