This program should take two arguments: the name of essay a file consisting of text emails strung together and a file containing the learned parameters. Using the statistics learned by the naive bayes learner, it should ouput to stdout the estimated class of each input email: 1 for spam and 0 for legitimate. Each classification should be on a separate line. If the name of your classifier executable is spamNBClassifier, then the command to run your classifier on the email messages in the file test using the params learned above would be: spamNBClassifier test spam_params. Out, please specify the name of the classifier executable in the readme header. The expected output should be something like. You are to to test your classifier on each email of a set of unlabeled test emails (provided June 6). You should store the results of your classifier in the file results. Part of your grade will be based on the classification error of your algorithm on this test set.
This program should input the set of labeled training text emails (as opposed to feature vectors) and estimate the necessary statistics. It should then output these statistics to a file to be used by the naive bayes classifier. Use whatever input and output interfaces you like. In the readme header, specify the exact command needed to run book your learner on the spam and legit training emails. For instance, if the command to run your learner is: spamnblearner legit spam spam_params. Out please specify this in the readme header. Implement the naive bayes Classifier. You must write a program that implements the naive bayes classifier algorithm.
The email files are given as long text files with emails concatenated together (namely an mbox created from a bunch. Part of the task of the preprocessor is to separate this text file into separate emails; this doesnt need to be output, but is obviouslt critical for tagging individual emails. Most binary attachments have been stripped. The headers have been abridged to suppress labelings by spam-filters. For non-spam, names and emails have been changed to protect the innocent. Files are obtainable here. Implement the naive bayes learner. You must write a program that implements the naive bayes learning algorithm.
Lesson your esl students Will Gobble
Half of the points are based on the quality of the write-up, testing, and code. The other half are based on the performance of your code on a test set of problems and against other students submissions. Bonus points may be awarded at the ta's discretion for going above and beyond the hw's description. Points will be deducted for not following these instructions. Note that no late programming assignments will be accepted! You must write a preprocessor function that inputs text emails, computes the values of chosen features for these emails, and outputs the resulting vectors of feature values. If you like, the preprocessor may be written in a scripting language such as Python or Perl.
This function will be used by both the learning and classifying algorithms. You should not have to replicate this code in both programs. In designing the preprocessor, you must choose which features to use. In your report, describe and justify your method for choosing love features. You may use any interface you like, but be sure to describe this interface in your readme.
you will have to divide each file into its separate email messages; this is not difficult and you should make sure your counts match the official numbers. . you must also write a script to preprocess each message, to compute the value for it of each feature that you use. On Wednesday june 6 we will publish a test set of mixed spam and legitimate email messages. . you will run your classifier to generate a file of predicted labels 1 for spam and 0 for legitimate. . we will compute the percentage of correct labels that you submit, and part of your score will be based on this percentage. .
Note that for this assignment, both types of mistake (classifying spam as legitimate, and vice versa) are equally bad. Logistics: you can work alone or in groups of at most. Code must be submitted (with comments) by the due date. A copy of your write up as a pdf should be included. Turn in the code using the turnin script. Hard copies of write-ups are due no later than at the start of the final, printed and stapled. Code should be included in the printout. Write-ups should include a brief description of the approach taken, answers to any questions posed, and sample output on some examples. See below for more details on the content of the write.
Homework, over, thanksgiving, break!
The task here is to build a spam filter using bayesian learning. . you should implement the naive bayes learning algorithm (NB) described in class, and apply it to learn a classifier that distinguishes spam from legitimate email hippie messages as accurately as possible. You must write a preprocessor that converts an email message into a vector of feature values. . Use your creativity in choosing features that you think are likely to be predictive,. Likely to be different for the two classes of messages. . Use at least 100 different features, and make sure your nb implementation can handle at least 200 features and 2000 training examples. . Most of your features are likely to be words, but also use other features such as the time of day of the message. . Use human intelligence to avoid features that may be highly predictive for the training data, but that will not be useful for test data, for example the date of the email message. There is a training set of about 600 spam messages and here is a training set of about 300 legitimate messages. .
Pdf Thu dec 3 Homework 10 Final (Date, time, and room tba) home schedule assignments handouts software fp@cs Frank Pfenning. Physics 231, instructor: Peter young, isb, 212 e-mail: time and Place: mwf 9:30-10:40 am, isb 231, office hour: Thursdays, 12:00-1:30. Also, at other times by appointment. Rescheduling of one lecture during Thanksgiving week: There will be no lecture. Instead there will be a makeup lecture on tuesday november 22, 12:00 - 1:10. Table of contents: Peter young's Home page. Cse essay 150 Programming Assignment 4, date assigned: may 25, 2007, date due: 11:59:59 June 7 2007.
11-inversion. Pdf Thu Oct 8 Propositional Theorem Proving 12-proving. Pdf tue oct 13 Logic Programming 13-lp. Pdf, code Thu Oct 15 Prolog 14-prolog. Pdf, code homework 5 tue oct 20 Metaprogramming in Prolog Thu Oct 22 The lf logical Framework code homework 6 tue oct 27 Modes, termination, and coverage code Thu Oct 29 Certifying Theorem Proving code homework 7 tue nov 3 Verifications and Sequent Calculus code. Pdf tue nov 17 Magic Templates hw10.pdf Thu nov 19 Imperative logic Programming Homework 9 tue nov 24 Optional Problem-Solving Session Thu nov 26 Thanksgiving Holiday tue dec 1 a taste of Linear Logic lp:12-linear.
Tue, aug 25, overview 01-overview. Pdf, thu, aug 27, natural Deduction 02-natded. Sep 1, harmony 03-harmony. Pdf, thu, sep 3, proofs as Programs 04-pap. Pdf, thu, sep 10, natural Numbers 06-nat. Pdf, homework 1, tue. Sep 15, classical Logic resume 07-classical. Pdf, thu, sep 17, classical Computation 08-classical-programs.
2013 - rowdy in room 300
Lectures are tuesday and Thursday in ghc 4102. Recitations dubai are wednesday in weh 8124. Recitations generally cover the material from the preceding two lectures. The class notes provide additional reading material. They complement, but do not replace the lecture. The schedule is subject to change throughout the semester. Date lecture or Recitation reading Homework due.