<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-9078680868976670676</id><updated>2011-07-08T09:26:19.437-07:00</updated><category term='SWIG'/><category term='biopython'/><category term='tools'/><category term='python'/><category term='encoding'/><category term='source code'/><category term='svms'/><category term='libsvm'/><category term='active learning'/><category term='machine learning'/><category term='bioinformatics'/><category term='pubmed'/><category term='text retrieval'/><title type='text'>machine learning in biomedicine</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://mlbiomedicine.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9078680868976670676/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://mlbiomedicine.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Byron Wallace</name><uri>http://www.blogger.com/profile/13484739729083118904</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>5</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-9078680868976670676.post-5547213873040286588</id><published>2009-12-13T18:38:00.000-08:00</published><updated>2009-12-13T18:51:35.532-08:00</updated><title type='text'>Meta-Analyst</title><content type='html'>&lt;span style="font-family:verdana;"&gt;Not machine learning related, but pertinent to software and biomedicine, &lt;a href="http://tuftscaes.org/meta_analyst/"&gt;meta-analyst&lt;/a&gt;, our software for conducting meta-analyses, is publicly available. See our publication &lt;a href="http://www.biomedcentral.com/1471-2288/9/80/abstract"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9078680868976670676-5547213873040286588?l=mlbiomedicine.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mlbiomedicine.blogspot.com/feeds/5547213873040286588/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mlbiomedicine.blogspot.com/2009/12/meta-analyst.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9078680868976670676/posts/default/5547213873040286588'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9078680868976670676/posts/default/5547213873040286588'/><link rel='alternate' type='text/html' href='http://mlbiomedicine.blogspot.com/2009/12/meta-analyst.html' title='Meta-Analyst'/><author><name>Byron Wallace</name><uri>http://www.blogger.com/profile/13484739729083118904</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9078680868976670676.post-1516947241100590868</id><published>2009-07-27T15:21:00.000-07:00</published><updated>2009-08-08T08:04:24.613-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='tools'/><category scheme='http://www.blogger.com/atom/ns#' term='bioinformatics'/><category scheme='http://www.blogger.com/atom/ns#' term='encoding'/><category scheme='http://www.blogger.com/atom/ns#' term='text retrieval'/><category scheme='http://www.blogger.com/atom/ns#' term='machine learning'/><title type='text'>pubmedpy: a simple module for fetching and tf-idf encoding biomedical texts</title><content type='html'>&lt;span style="font-family:verdana;"&gt;Recently, I've been working on biomedical text classification. &lt;/span&gt;&lt;span style="font-family:verdana;"&gt;A necessary but tedious task in biomedical text classification is encoding data to a format suitable for classification algorithms. This often means I have the ids for some curated set of pubmed documents, and need to:&lt;br /&gt;&lt;/span&gt;&lt;ul&gt;&lt;li&gt;Fetch these documents from pubmed&lt;/li&gt;&lt;li&gt;Encode them, e.g., in Term Frequency / Inverse Document Frequency format, for a classification library&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-family:verdana;"&gt;In the past, I had various scripts to complete these steps. Here, I've aggregated these into a coherent library. You can hand pubmedpy a dictionary, mapping pubmed ids to labels. The script will pull all of the abstracts and titles (by default, additional fields can also be retrieved) for these documents, clean them (stripping stop words such as "the", etc.) and then TF-IDF encode them. (Alternatively, binary coding is also possible).&lt;/span&gt;&lt;br /&gt;&lt;p&gt;&lt;span style="font-family:verdana;"&gt;For example, let's use the example_usage module to demonstrate the functionality.&lt;/span&gt;&lt;br /&gt;&lt;/p&gt;&lt;blockquote&gt;&lt;pre&gt;import pubmedpy&lt;br /&gt;pubmedpy.set_email(your@email.com)&lt;br /&gt;lbl_dict = get_pmid_to_label_dict()&lt;br /&gt;pubmedpy.fetch_and_encode(lbl_dict.keys(), "output", lbl_dict = lbl_dict)&lt;span style="font-family:Georgia,serif;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:verdana;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/blockquote&gt;&lt;span style="font-family:verdana;"&gt;The example_usage module builds a dictionary mapping pubmed ids to labels (what these mean are immaterial here). This dictionary is passed to the fetch_and_encode method in pubmedpy. That's it. Encoding takes awhile, but if all goes well, under the "output" directory, there will be "AB" and "TI" directories (abstracts and titles, respectively). Under these, there should be "cleaned" and "encoded" folders. The "encoded" folder should contain a single file with your data encoded in the ``libsvm'' sparse formatted style, shown below:&lt;br /&gt;&lt;blockquote&gt;&lt;pre&gt;&lt;br /&gt;1 163:0.0847798892434 297:0.263181728 ..&lt;/pre&gt;&lt;/blockquote&gt;&lt;/span&gt;&lt;span style="font-family:verdana;"&gt;This is one abstract per line. So above, the abstract has label '1', and the 163rd word -- which, incidentally, is 'positively', as we can see by looking at the "words_index.txt" file generated in the "cleaned" directory -- has a tf-idf value of .085 or so. Note that other files can be generated, e.g., Weka's ARFF, with very little tweaking of the code (indeed, there's already a method in the tfidf2 library to dump to ARFF).&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;The library requires &lt;/span&gt;&lt;a style="font-family: verdana;" href="http://biopython.org/"&gt;BioPython&lt;/a&gt;&lt;span style="font-family:verdana;"&gt; and can be fetched at Github: http://github.com/bwallace/pubmedpy/&lt;/span&gt;&lt;br /&gt;&lt;p&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9078680868976670676-1516947241100590868?l=mlbiomedicine.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mlbiomedicine.blogspot.com/feeds/1516947241100590868/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mlbiomedicine.blogspot.com/2009/07/pubmedpy-simple-module-for-fetching-and.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9078680868976670676/posts/default/1516947241100590868'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9078680868976670676/posts/default/1516947241100590868'/><link rel='alternate' type='text/html' href='http://mlbiomedicine.blogspot.com/2009/07/pubmedpy-simple-module-for-fetching-and.html' title='pubmedpy: a simple module for fetching and tf-idf encoding biomedical texts'/><author><name>Byron Wallace</name><uri>http://www.blogger.com/profile/13484739729083118904</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9078680868976670676.post-1434947964116100199</id><published>2009-04-06T16:56:00.000-07:00</published><updated>2009-04-08T07:58:48.680-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='libsvm'/><category scheme='http://www.blogger.com/atom/ns#' term='active learning'/><category scheme='http://www.blogger.com/atom/ns#' term='python'/><title type='text'>active learning with python &amp; libsvm, part 2</title><content type='html'>&lt;span style="font-family:verdana;"&gt;Active learning is a hugely useful framework for efficiently exploiting the resources of the 'expert' during training (i.e., for mitigating the amount of hand labeling that must be done by a human in order to train a classifier). Active learning works by allowing the model, or learner, to select instances for the expert to label from a pool of unlabeled data examples, rather than having him or her categorize instances at random. This is advantageous because unlabeled data is often copious and cheaply available, but the time of a human expert is an immensely valuable resource. Burr Settles has written an excellent &lt;a href="http://pages.cs.wisc.edu/%7Ebsettles/active-learning"&gt;survey of the active learning&lt;/a&gt;&lt;a href="http://pages.cs.wisc.edu/%7Ebsettles/active-learning"&gt; literature&lt;/a&gt;&lt;a href="http://pages.cs.wisc.edu/%7Ebsettles/active-learning"&gt;.&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;In a &lt;a href="http://mlbiomedicine.blogspot.com/2009/03/python-libsvm-or-on-hacking-libsvm.html"&gt;previous post&lt;/a&gt; I explored&lt;span style="font-family:verdana;"&gt; hacking the popular &lt;a href="http://www.blogger.com/www.csie.ntu.edu.tw/%7Ecjlin/libsvm/"&gt;libsvm&lt;/a&gt; C++ library to expose additional fields to the python interface. &lt;/span&gt;In particular, I made the norm of |w|, the &lt;a href="http://en.wikipedia.org/wiki/File:Svm_max_sep_hyperplane_with_margin.png"&gt;separating hyperplane&lt;/a&gt;&lt;/span&gt;&lt;span style="font-family:verdana;"&gt;&lt;span style="font-family:verdana;"&gt; found over training data, available from the python side of things. The upshot is that access to this field makes the implementation of the most popular variant of uncertainty sampling for SVMs, Tong and Koller's &lt;/span&gt;&lt;/span&gt;&lt;span style="font-family:verdana;"&gt;&lt;span style="font-family:verdana;"&gt;&lt;a href="http://www.google.com/url?sa=t&amp;amp;source=web&amp;amp;ct=res&amp;amp;cd=1&amp;amp;url=http%3A%2F%2Fjmlr.csail.mit.edu%2Fpapers%2Fvolume2%2Ftong01a%2Ftong01a.pdf&amp;amp;ei=5bzaSbWcPKLflQep3PGKCA&amp;amp;usg=AFQjCNG-xy8DuIAKnw_3Tb35oqmbg_vpgw&amp;amp;sig2=zzib3RG3l76ssgjKi5pFEw"&gt;SIMPLE&lt;/a&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="font-family:verdana;"&gt;&lt;span style="font-family:verdana;"&gt;, a breeze to implement in python.&lt;br /&gt;&lt;br /&gt;I have updated &lt;a href="http://code.google.com/p/libsvm288fork/"&gt;the repository&lt;/a&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="font-family:verdana;"&gt;&lt;span style="font-family:verdana;"&gt; with an implementation (please note that those not on OS X will likely need to recompile). In particular, I have created a &lt;a href="http://code.google.com/p/libsvm288fork/source/browse/trunk/python/learner.py"&gt;learner&lt;/a&gt; module, wherein the "active learn" method with accepts a parametric &lt;span style="font-style: italic;"&gt;query_function&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; &lt;span style="font-family:verdana;"&gt;&lt;span style="font-family:verdana;"&gt; &lt;/span&gt;&lt;/span&gt;&lt;span style="font-family:verdana;"&gt;&lt;span style="font-family:verdana;"&gt;which is assumed to be a function that returns instance numbers to "label" (labeling is simulated here, as the true labels are assumed to be known). By default, the learning strategy is SIMPLE, but other query strategies could easily be "plugged in" to the framework.&lt;br /&gt;&lt;br /&gt;Sample use:&lt;br /&gt;&lt;blockquote&gt;&lt;span style=";font-family:georgia;font-size:85%;"  &gt;        &lt;span style="font-weight: bold;"&gt;1&lt;/span&gt; dataset = [dataset.build_dataset_from_file("my_data")]&lt;br /&gt;  &lt;span style="font-weight: bold;"&gt;2&lt;/span&gt; active_learner = learner.learner(dataset)&lt;br /&gt;  &lt;span style="font-weight: bold;"&gt;3&lt;/span&gt; active_learner.pick_initial_training_set(200)&lt;br /&gt;  &lt;span style="font-weight: bold;"&gt;4&lt;/span&gt; active_learner.rebuild_models(undersample_first=True)&lt;br /&gt;  &lt;span style="font-weight: bold;"&gt;5&lt;/span&gt; active_learner.active_learn(40, num_to_label_at_each_iteration=1)&lt;/span&gt;&lt;/blockquote&gt;&lt;span style=";font-family:georgia;font-size:85%;"  &gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="font-family:verdana;"&gt;&lt;span style="font-family:verdana;"&gt;In &lt;span style="font-weight: bold;"&gt;1&lt;/span&gt;, it is assumed "my_data" contains a sparse libsvm-formatted file. The learner wants a list of datasets so that ensembles can be built -- i.e., different classifiers built using different types of data. In &lt;span style="font-weight: bold;"&gt;5&lt;/span&gt;, the first argument is the total number of examples to be labeled; the second is the number to label at each iteration in the active learning (i.e., the 'batch' size). I hope this is useful to anyone&lt;/span&gt;&lt;/span&gt; &lt;span style="font-family:verdana;"&gt;&lt;span style="font-family:verdana;"&gt;interested in playing with active learning. &lt;/span&gt;&lt;/span&gt;&lt;span style="font-family:verdana;"&gt;&lt;span style="font-family:verdana;"&gt;&lt;span style=";font-family:georgia;font-size:85%;"  &gt; &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9078680868976670676-1434947964116100199?l=mlbiomedicine.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mlbiomedicine.blogspot.com/feeds/1434947964116100199/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mlbiomedicine.blogspot.com/2009/04/active-learning-with-python-libsvm-part.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9078680868976670676/posts/default/1434947964116100199'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9078680868976670676/posts/default/1434947964116100199'/><link rel='alternate' type='text/html' href='http://mlbiomedicine.blogspot.com/2009/04/active-learning-with-python-libsvm-part.html' title='active learning with python &amp; libsvm, part 2'/><author><name>Byron Wallace</name><uri>http://www.blogger.com/profile/13484739729083118904</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9078680868976670676.post-1814624824699263627</id><published>2009-03-29T05:58:00.001-07:00</published><updated>2009-03-29T06:26:34.644-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='python'/><category scheme='http://www.blogger.com/atom/ns#' term='biopython'/><category scheme='http://www.blogger.com/atom/ns#' term='pubmed'/><title type='text'>retrieving pubmed articles with BioPython</title><content type='html'>&lt;span style="font-family:verdana;"&gt;I needed to retrieve the abstract text and titles for all of the articles returned by &lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/"&gt;pubmed&lt;/a&gt; query. Fortunately, this proved even easier than I had expected, thanks to the wonderful &lt;a href="http://biopython.org/wiki/Main_Page"&gt;BioPython&lt;/a&gt; toolkit. Indeed, I basically needed only to borrow from the examples in &lt;a href="http://www.biopython.org/DIST/docs/tutorial/Tutorial.html#htoc70"&gt;this tutorial&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;A sample use of the &lt;a href="http://code.google.com/p/pubmedfetchr/"&gt;completed script&lt;/a&gt; is as follows:&lt;br /&gt;&lt;blockquote  style="font-family:arial;"&gt;&lt;span style="font-size:85%;"&gt;python pubmed_fetchr.py -e youremail@gmail.com -s "phylogenetic trees"&lt;/span&gt;&lt;/blockquote&gt;Note that the email argument is to let NCBI know who is querying the system, which is good practice. This will create a parent directory with the query name ("phylongenetic trees", in this case) and two subdirectories underneath, corresponding to the abstract and title texts, respectively. In the abstracts directory, there will be &lt;span style="font-style: italic;"&gt;n&lt;/span&gt; plain text files (where &lt;span style="font-style: italic;"&gt;n &lt;/span&gt;is the number of articles returned by the query) with names corresponding to their PubMed IDs, each containing the text of the corresponding article abstract. Likewise, article title text will be saved in files named after the corresponding PubMed IDs in the titles directory.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://code.google.com/p/pubmedfetchr/"&gt;Here is the code&lt;/a&gt;&lt;/span&gt;&lt;span style="font-family:verdana;"&gt;.&lt;/span&gt;&lt;span style="font-family:verdana;"&gt; It would be straight forward to write out additional fields (&lt;span style="font-style: italic;"&gt;i&lt;/span&gt;.&lt;span style="font-style: italic;"&gt;e&lt;/span&gt;., fields other than title and abstract).&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9078680868976670676-1814624824699263627?l=mlbiomedicine.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mlbiomedicine.blogspot.com/feeds/1814624824699263627/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mlbiomedicine.blogspot.com/2009/03/retrieval-of-pubmed-articles-with.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9078680868976670676/posts/default/1814624824699263627'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9078680868976670676/posts/default/1814624824699263627'/><link rel='alternate' type='text/html' href='http://mlbiomedicine.blogspot.com/2009/03/retrieval-of-pubmed-articles-with.html' title='retrieving pubmed articles with BioPython'/><author><name>Byron Wallace</name><uri>http://www.blogger.com/profile/13484739729083118904</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9078680868976670676.post-8922570948370016523</id><published>2009-03-13T08:01:00.000-07:00</published><updated>2009-03-14T08:10:46.332-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='libsvm'/><category scheme='http://www.blogger.com/atom/ns#' term='SWIG'/><category scheme='http://www.blogger.com/atom/ns#' term='svms'/><category scheme='http://www.blogger.com/atom/ns#' term='python'/><category scheme='http://www.blogger.com/atom/ns#' term='source code'/><title type='text'>active learning with python &amp; libsvm (or; on hacking libsvm)</title><content type='html'>&lt;div style="text-align: justify;"&gt;&lt;div style="text-align: justify;"&gt;&lt;span style="font-family:verdana;"&gt;Arguably the most popular &lt;a href="http://en.wikipedia.org/wiki/Support_vector_machine"&gt;Support Vector Machine&lt;/a&gt; (SVM) library is &lt;a href="http://www.csie.ntu.edu.tw/%7Ecjlin/libsvm"&gt;libsvm&lt;/a&gt;. It is widely used, cross-platform and open-source. Moreover, while written in C++, there are myriad interfaces that provide access to the library in every language from Python to Matlab to &lt;a href="http://www.blogger.com/en.wikipedia.org/wiki/Brainfuck"&gt;brainf*ck&lt;/a&gt;. OK, not brainf*ck.&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;span style="font-family:verdana;"&gt;Here I'll focus on the bridge from Python. In particular, I was interested in using &lt;a style="font-style: italic;" href="http://hunch.net/?p=49"&gt;active learning&lt;/a&gt;, a useful framework for interactively training classifiers. In active learning, an expert provides labeled training examples for a model (e.g., an SVM), and the model iteratively selects new points from the remaining unlabeled data for labeling at each step. This differs from the canonical learning framework wherein you train your classifier on a random set of labeled data. The intuition is that the classifier can select informative examples for the expert to label at each iteration in the training process (in other words, active learning better exploits the expert). The upshot is that the most popular active learning strategy (i.e., method of selecting maximally informative examples for labeling) for SVMs is known as SIMPLE, which picks the (unlabeled) point closest to the separating hyperplane for labeling at each step. This requires access to the norm of &lt;/span&gt;&lt;span style="font-family:lucida grande;"&gt;&lt;span style="font-weight: bold;"&gt;w&lt;/span&gt;&lt;/span&gt;&lt;span style="font-family:verdana;"&gt;, &lt;a href="http://en.wikipedia.org/wiki/File:Svm_max_sep_hyperplane_with_margin.png"&gt;the vector orthogonal&lt;/a&gt; to the separating hyperplane.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;div style="text-align: justify;"&gt;&lt;span style="font-family:verdana;"&gt;Unfortunately, libsvm doesn't automatically compute this. The libsvm folks do, however, &lt;a href="http://www.csie.ntu.edu.tw/%7Ecjlin/libsvm/faq.html#f4151"&gt;outline&lt;/a&gt; how one might go about this in their FAQ:&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;blockquote&gt;&lt;span style="font-size:85%;"&gt;The distance is |decision_value| / |w|.  We have |w|^2 = w^Tw = alpha^T Q alpha = 2*(dual_obj + sum alpha_i).  Thus in svm.cpp please find the place  where we calculate the dual objective value (i.e., the subroutine Solve()) and add a statement to print w^Tw.&lt;/span&gt;&lt;/blockquote&gt;&lt;div style="text-align: justify;"&gt;&lt;span style="font-family:verdana;"&gt;Thus the following steps were required to extract |&lt;span style="font-weight: bold;"&gt;w&lt;/span&gt;| from the library: (1) hack the C++ code as outlined above and (2) provide access to the computed value via Python. The first step was simple enough. After some modifications to various structs (e.g., svm_model) to keep hold of the computed|&lt;span style="font-weight: bold;"&gt;w&lt;/span&gt;| value, I ultimately added the following method to svm.cpp:&lt;/span&gt;&lt;/div&gt;&lt;span style="font-family:verdana;"&gt;&lt;pre name="code"&gt;&lt;blockquote&gt;// get |w|^2&lt;br /&gt;double svm_get_model_w2(struct svm_model  *model)&lt;br /&gt;{&lt;br /&gt; return model-&gt;w_2;&lt;br /&gt;}&lt;/blockquote&gt;&lt;/pre&gt;&lt;/span&gt;&lt;div style="text-align: justify;"&gt;&lt;span style="font-family:verdana;"&gt;Next, the header (svm.h) needed be updated to reflect this change. libsvm next needs to be rebuilt. On *nix (including OS X) this can be done with make. On Windows, I managed to do it with Visual Studio by first making sure the vcvar.bat file was in my path:&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;span style="font-family:verdana;"&gt;&lt;blockquote  style="font-family:lucida grande;"&gt;&lt;span style="font-family:verdana;"&gt;&lt;pre name="code" class="python"&gt;&gt;"C:\Program Files\Microsoft Visual Studio 8\VC\bin\vcvar.bat"&lt;/pre&gt;&lt;/span&gt;&lt;/blockquote&gt;&lt;/span&gt;&lt;div style="text-align: justify;"&gt;&lt;span style="font-family:verdana;"&gt;Then:&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;span style="font-family:verdana;"&gt;&lt;blockquote&gt;&lt;span style="font-size:85%;"&gt;&lt;span style="font-family:lucida grande;"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="font-family:verdana;"&gt;&lt;pre name="code" class="python"&gt;&lt;span style="font-size:100%;"&gt;&gt; nmake -f Makefile.win clean&lt;/span&gt;&lt;span style="font-size:100%;"&gt;&lt;br /&gt;&gt; nmake -f Makefile.win &lt;span style="font-family:lucida grande;"&gt;all&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/span&gt;&lt;span style="font-size:85%;"&gt;&lt;span style="font-family:lucida grande;"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/blockquote&gt;&lt;/span&gt;&lt;div style="text-align: justify;"&gt;&lt;span style="font-family:verdana;"&gt;&lt;span style=";font-family:verdana;font-size:100%;"  &gt;Next we need to modify the Python interface. In the "Python" directory, add the method header to the svmc.i interface files:&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;span style="font-family:verdana;"&gt;&lt;pre name="code" class="python"&gt;&lt;blockquote&gt;&lt;span style="font-size:100%;"&gt;double svm_get_model_w2(struct svm_model *model);&lt;/span&gt;&lt;/blockquote&gt;&lt;/pre&gt;&lt;/span&gt;&lt;div style="text-align: justify;"&gt;&lt;span style="font-family:verdana;"&gt;Finally, rebuild the interface:&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;span style="font-family:verdana;"&gt;&lt;pre name="code" class="Cpp"&gt;&lt;blockquote&gt;&gt;swig -python svmci.i&lt;br /&gt;&gt;python setup.py build build_ext --inplace&lt;/blockquote&gt;&lt;/pre&gt;&lt;/span&gt;&lt;div style="text-align: justify;"&gt;&lt;span style="font-family:verdana;"&gt;&lt;span style="font-family:verdana;"&gt;Now the added method will be available through svm.svmc.svm_get_model_w2. If you're interested, the source is available @ &lt;a href="http://code.google.com/p/libsvm288fork/"&gt;http://code.google.com/p/libsvm288fork/ &lt;/a&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;span style="font-family:verdana;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9078680868976670676-8922570948370016523?l=mlbiomedicine.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mlbiomedicine.blogspot.com/feeds/8922570948370016523/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mlbiomedicine.blogspot.com/2009/03/python-libsvm-or-on-hacking-libsvm.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9078680868976670676/posts/default/8922570948370016523'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9078680868976670676/posts/default/8922570948370016523'/><link rel='alternate' type='text/html' href='http://mlbiomedicine.blogspot.com/2009/03/python-libsvm-or-on-hacking-libsvm.html' title='active learning with python &amp; libsvm (or; on hacking libsvm)'/><author><name>Byron Wallace</name><uri>http://www.blogger.com/profile/13484739729083118904</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
