Friday, December 16, 2011

In defense of "Open Submission" in scientific publishing

The much-trumpeted "Open Access" movement in scientific publishing promises to revolutionize scientific accessibility, so that anyone can freely obtain the latest scientific research publications.  In this brave new world, the evil profit-driven publishers no longer prey upon the scientific community and taxpayers are no longer unjustly kept from reading about the work their taxes pay for.

Unfortunately, the reality of open access has become virtually synonymous with the "author-pay" publishing model.   In the name of making scientific publishing more open, author-pay publishing raises a whole new barrier.  Instead of requiring the reader to pay for access, the author (i.e., the scientist) now must pay to have his article published.  So we are gaining freedom of access in exchange for giving up freedom of submission.  Does that make any sense?

Consider the following quote from one of the great promoters of Open Access, Michael Nielsen:

Einstein’s proposals were astounding, yet his arguments were so compelling that his work was published in one of the leading physics journals of his day, and was rapidly accepted by most leading physicists. How remarkable that an outsider, a virtual unknown, could come in to challenge many of our most fundamental beliefs about how the universe works. And, in no time at all, the community of physicists essentially said, “Yeah, you’re right.”

(Reinventing Discovery: The New Era of Networked Science)

The reason Einstein's ideas gained rapid acceptance is that anyone -- even a patent clerk with no grant money! -- could submit their work to a leading journal and have it refereed by the experts.  What would have happened if the Open Access movement had transformed scientific publishing before he came along?

Perhaps our children will one day launch an "Open Submission" campaign, crying that grad students and scientists from third world countries must no longer be barred from publishing just because they can't pay.  Let's make sure they don't have to.  Open submission, always taken for granted until now, must be one of the fundamental tenets of scientific publishing. Say no to author-pay journals.  Don't submit work to them, don't referee for them, and don't serve as editors for them.

There is a better road to open access.  Put all your papers on the arXiv.  Publish only in journals that allow you to post a final version on your website (or institutional server).

 

Addendum:

The many potential evils of the author-pay model are explained in more detail in two articles published in the Notices of the AMS:

http://www.ams.org/notices/201109/rtx110901294p.pdf

http://www.ams.org/notices/200803/tx080300381p.pdf

Wednesday, November 30, 2011

Support the stackexchange for computational science

There is a new site on stackexchange for discussing computational science!  Please come participate -- there are already some great discussions.  The site is now in public beta, so you don't need an invitation.

In case you don't know about stackexchange, you may also be interested in the folowing:

Tuesday, November 15, 2011

Do you know what your colleagues are reading?

Up until Google's recent (catastrophic) changes to Reader, I used it to share and discuss interesting journal articles.  It was a near-perfect platform for this, and I'm hopeful that we'll have a replacement soon.

The great utility of it was that my colleagues are very good at discerning which articles may be of interest to others in our circle.  This is no surprise, since we have similar research interests.  The fraction of articles that are actually interesting to me, for most journal RSS feeds that I check, is 1%-5%, which means I spend a lot of time scanning article titles.  In contrast, the fraction of papers shared by my colleagues that I find interesting is probably closer to 50%!

I've found a nice way to display a public RSS feed of papers that I read*, via Mendeley (it's shown here on the right).  Now, ideally, Mendeley would allow me to publish a feed that includes all papers in my Mendeley library as I add them.  They don't, but they do something almost as good: they provide a public RSS feed showing all papers for any public Mendeley group, as they are added.  So here's what I did:

1. Create a public Mendeley group for my own library.

2. Whenever I import a new reference to Mendeley, I also add it to the group (note that you can do this via the dropdown menu in the popup that appears whenever you use the 'Import to Mendeley' bookmarklet.

3. I got the address for the feed from Mendeley (log in, click the 'Groups' tab, click 'Papers' on the left, and look for the RSS feed icon on the top right) and added a widget here on my blog, as well as on my professional webpage.

That's it.  If you want to subscribe to this RSS feed, here it is:

feed://www.mendeley.com/groups/1194891/david-ketcheson-s-library/feed/rss/

 

[*] Note that 'read' here means 'read at least the abstract.'

Thursday, November 10, 2011

Book Review: Reinventing Discovery

I believe that the process of science—how discoveries are made—will change more in the next twenty years than it has in the past 300 years. --Michael Nielsen, Reinventing Discovery
I appreciate an author who's not afraid to make bold claims, and Michael Nielsen certainly fits that description.  He goes on to say even that
To historians looking back a hundred years from now, there will be two eras of science: pre-network science, and networked science.  We are living in the time of transition to the second era of science.
I grew up feeling that the golden age of science was the first half of the twentieth century, which gave us marvelous advances like relativity and quantum mechanics.  According to Nielsen, though, I'm witnessing the most transformative period of scientific development since the invention of the scholarly journal in the 1700's.  Although I'm a firm believer in the power of the internet to accelerate scientific advances, I was skeptical.
I downloaded Michael Nielsen's Reinventing Discovery on Tuesday and read it in less than 48 hours (between shopping trips while on vacation in Dubai).  Although I was familiar with much of the material in the book, it was an engaging and highly thought-provoking read that I think both scientists and laypersons will enjoy.  I'll focus here on the ideas that struck me as especially insightful.
Nielsen gives several examples to illustrate the beginnings of his foretold revolution; some are scientific (the Polymath projectGalaxyZoo, FoldIt) while others simply illustrate the power of our new networked world (Kasparov versus the World, Innocentive).  These examples are used extensively and lend a convincing empricism to a book that claims to predict the future.  They also allow Nielsen to dive into actual science, adding to the fun.
Many scientific advances are the result of combinations of knowledge from different fields, communities, or traditions that are brought together by fortuitous encounters among different people.  In a well-networked world, these encounters can be made to happen by giving individuals enough accessible information and communication. Nielsen refers to this as "designed serendipity".
The reason designed serendipity is important is because in creative work, most of us...spend much of our time blocked by problems that would be routine, if only we could find the right expert to help us. As recently as 20 years ago, finding that right expert was likely to be difficult. But, as examples such as InnoCentive and Kasparov versus the World show, we can now design systems that make it routine.
Offline, it can take months to track down a new collaborator with expertise that complements your own in just the right way. But that changes when you can ask a question in an online forum and get a response ten minutes later from one of the world’s leading experts on the topic you asked about.
The trouble is, of course, that the forum in question doesn't exist -- and if it did, who would have time to read all the messages?  Nielsen delves into this question, discussing how to design an "architecture of attention" that allows individuals to focus on the bits most relevant to them, so that large groups of people can work on a single problem in a way that allows each of them to exercise his particular expertise.  Taking the idea of designed serendipity to its logical yet astounding conclusion, Nielsen presents a science fiction (pun intended) portrayal of a future network that connects all researchers across disciplines to the collaborations they are most aptly suited for.  I found this imaginary future world both fascinating and believable.
The second part of the book explores the powers that are being unleashed as torrents of data are made accessible and analyzable.  Here Nielsen draws examples from Medline, Google Flu Trends, and GalaxyZoo.  While the importance of "data science" is already widely recognized, Nielsen expresses it nicely:
Confronted by such a wealth of data, in many ways we are not so much knowledge-limited as we are question-limited...the questions you can answer are actually an emergent property of complex systems of knowledge: the number of questions you can answer grows much faster than your knowledge.
In my opinion, he gets a bit carried away, suggesting that huge, complex models generated by analyzing mountains of data "might...contain more truth than our conventional theories" and arguing that "in the history of science the distinction between models and explanations is blurred to the point of non-existence", using Planck's study of thermal radiation as an example.  Planck's "model" was trying to explain a tiny amount of data and came up with terse mathematical equations to do so.  The suggestion that such a model is similar to linguistic models based on fitting terabytes (or more!) of data, and that the latter hold some kind of "truth" surprised me -- I suspect rather that models informed by so much data are accurate because they never need to do more than interpolate between nearby known values.  Nevertheless, it was interesting to see Nielsen's different and audacious perspective well-defended.
A question of more practical importance is how to get all those terabytes of data out in the open, and Nielsen brings an interesting point of view to this discussion as well, comparing the current situation to that of the pre-journal scientific era, when figures like Galileo and Newton communicated their discoveries by anagrams, in order to ensure the discoverer could claim credit later but also that his competitors couldn't read the discovery until then.  The solution then was imposed top-down: wealthy patrons demanded that the discoveries they funded be published openly, which meant that one had to publish in order to get and maintain a job.
The logical conclusion is that policies (from governments and granting agencies) should now be used to urge researchers to release their data and code publicly.  Employment decisions should give preference to researchers who follow this approach.  At present, the current of incentives rather discourages such "open science", but like Nielsen I am hopeful that the tide will soon turn.  I was left pondering what I could do to help; Nielsen provides numerous suggestions.  I'll conclude with some of the most relevant for computational scientists like myself.
...a lot of scientific knowledge is far better expressed as code than in the form of a scientific paper. But today, that knowledge often either remains hidden, or else is shoehorned into papers, because there’s no incentive to do otherwise. But if we got a citation-measurement-reward cycle going for code, then writing and sharing code would start to help rather than hurt scientists’ careers. This would have many positive consequences, but it would have one particularly crucial consequence: it would give scientists a strong motivation to create new tools for doing science.
...
Work in cahoots with your scientist programmer friends to establish shared norms for citation, and for sharing of code. And then work together to gradually ratchet up the pressure on other scientists to follow those norms. Don’t just promote your own work, but also insist more broadly on the value of code as a scientific contribution in its own right, every bit as valuable as more traditional forms.

Thursday, November 3, 2011

Collaborative scientific reading

I often feel that the deluge of mathematical publications, fueled by the ever-increasing number of researchers and mounting pressure to publish, threatens to overwhelm my ability to keep up with advances.  I don't think this is peculiar to applied mathematics.  No matter how adept you are at sifting the chaff and finding the most relevant work in your field, you won't possibly have time to read every paper that is germane to your research, let alone those of tangential interest that might provide new research avenues.  For my part, although I take time to read new papers every week, I've resigned myself to the fact that I won't see more than the abstract of most of the papers I'd like to read, because I need to conduct new research, teach, write, and so forth.

Reading and digesting a mathematical paper takes time and concentration.  Nevertheless, I find that perhaps 80% of the value I get out of reading most papers can be summed up in a paragraph or two that is easy to read and understand.  We all have practice producing those terse paragraphs because we regularly referee papers and provide a concise summary for the editor.  This summary includes things like "what's really new in this work" or "how this relates to previous work", as well as an evaluation of its merit.  Unfortunately, those referee reports are kept secret and unavailable to our colleagues.  I mentally create a similar report for most papers that I read in depth, although I don't usually write my evaluation down and I certainly don't send it to anyone.  What if every reader of a paper had access to the summaries and evaluations made by all the other readers?  I think we could all learn a lot more, a lot faster, about what our colleagues are accomplishing.

Recently, Fields medalist Timothy Gowers proposed an approach to accomplishing just that. The idea is to bring the functionality of StackOverflow to the arXiv, creating a place where everyone can publish and everyone can openly referee or comment.  The StackOverflow system of reputation and up-/down-voting would be used to help the best papers and best comments float to the top.  As Gowers admits, there are plenty of obstacles, but I'm hopeful that people with his level of clout in the mathematical community could really bring this to pass.  His interest seems mostly based on issues with the current journal publication system, but I see it primarily as a way to "collaboratively read" the literature.  Indeed, it might be best if the site had no implications for decisions on hiring or tenure, to avoid any motivation to game the system.  The site would also be a great place for expository writing that can't be published in a journal.

It's encouraging to see that some things are already moving in this direction.  A new website named PaperCritic has just been launched to accomplish something roughly along these lines.  It doesn't involve the StackOverflow system, but has Mendeley integration and allows you to post a public review of any paper.  Meanwhile, an increasing number of scientists are including paper reviews in their blog posts -- something I would like to do here.

I think Mendeley could accomplish something useful in this direction if they would give users the option to make their library and notes public.  Then when I find a paper on Mendeley that says "20 Readers", I could find out who they are, see what they've written about that paper, and see what else they're reading.

Note: I know that we already have Mathematical Reviews, but in my opinion it doesn't accomplish the goals mentioned above, mainly because the reviewer of a paper is often not sufficiently knowledgeable about the paper to say anything more insightful than what's in the abstract.  I find that Mathematical Reviews gives me papers to review that I would never have read otherwise.  What I'd like to see are reviews from the people who read the paper because it's germane to their own work.

I discovered while writing this post that there was until very recently a successful site of this kind used by quantum computing researchers called scirate.com.  Perhaps we should focus on helping this guy get the site back up and start using it for math too.

Edit: Another brand-new open review system: http://open-review.org/

 

Wednesday, November 2, 2011

A better way to do multiple Gmail signatures: canned responses

I have both my personal and professional e-mail forwarded to a single Gmail account for convenience. One complication this causes is the need to use different signatures for correspondence from a single account. In the past, I've used the Blank Canvas Gmail Signatures extension in Firefox, but that has two drawbacks:

1. It has to be installed and the signatures configured separately on each computer I use.

2. It only works in Firefox.

Credit goes to an entry at thenextweb.com for pointing out a better way.  Just use the Gmail labs feature "canned responses".   Save each of your signatures as a canned response, and then you can add it automatically when composing messages.  This works in every browser and only needs to be set up once.  Contrary to what it says on thenextweb.com, you can include html in your signatures when using this method.

Something to watch out for: canned responses are actually saved as messages in your drafts folder.  They are hidden in the usual Gmail web view, but are visible in basic HTML mode or if you access mail through your phone.  Don't delete them.

Friday, October 28, 2011

Managing publication lists in HTML

As an academic, it's a good idea to maintain a professional website with a list of your publications.  Ideally, this list should include links to where visitors can download the papers (PDFs) and any related code.  In my case, I also maintain a website for my research group that has another publication list. Of course, you need to maintain local reference files with the citation info for your publications (for inclusion in later publications), as well as your CV.

Maintaining all these separate lists can become very tedious, which is probably why most academics' sites are usually out-of-date.  Here's how I automate much of it:

#pub {padding: 5px; border-width: 2px; border-style: none; background-color: #eee4b5; font-size: 1.2em; margin-top: 20px; margin-bottom: 20px;}
#pub a{font-weight: bold; color: #09434e;}
#pub name{font-weight: bold; color: #09434e;}
#pub journal{font-style: italic;}

 

The workflow for adding a new paper to an html bibliography is:

  1. Add the paper in Mendeley.
  2. Export bibtex from Mendeley.
  3. Run Python scripts.
  4. Paste resulting HTML into the appropriate file.

Again, it would be simpler if I could use Bibbase (cutting out steps 2-4).  It's still fairly painless, and it's easy to generate new bibilographic lists or customize the look of existing ones.

Thursday, October 27, 2011

Searching the scientific literature

Many of the fundamental skills of a scientist are seldom taught. Instead, one is expected to pick them up through intuition, informal conversations, or trial-and-error. One of these essential skills is how to search the literature for journal articles related to a particular topic.

This is a challenging task with severe consequences for failure. Just ask any Ph.D. student who discovered that his thesis was focused on a problem that had already been solved. Or anybody who left grad school because of the overwhelming task of grasping and keeping up with the scientific literature related to his thesis topic.

 

Why

The purpose of a literature search is not merely to become aware of what results are already known. Rather, a good literature search provides a map of the scientific terrain, indicating the general layout of a research area:

 

  • What are the main goals of research in this area?
  • What kind of advances are considered significant, and why?
  • What are the recognized open questions, and what impact would their answers have?
  • What other research areas are most closely connected to this one?
  • Are there other research areas with connections to this one that have not been recognized?
  • How is this research area viewed by those who focus on related, competing topics?

How

With these goals in mind, how does one conduct an effective literature search? Here are some techniques that have served me well:

  • Ask for help. You have a network of collaborators (or at least an advisor!) who each know some part of the literature much better than you. If you're starting research in a new topic where they have expertise, ask them for the most significant work on that topic. Ask their opinion of new papers that seem significant to you. Ask them for the right keywords, authors, and review articles to start with. Because they can make connections that a search engine never would, they are your most valuable resource.
  • Use Google Scholar. Yes, there are countless databases and search tools out there for looking at a articles from a particular discipline or publisher. But I have yet to find one as effective as Scholar. I'm convinced that its coverage is much broader than any of the commercial academic databases available. For instance, few other databases cover the ArXiv, which is an essential source in some fields.
  • Link forward through the literature. Every paper has a list of references to the works that it cites. But since you're mostly interested in learning about the state-of-the-art, it's usually more helpful to obtain a list of papers that cite the one you have. This is another major advantage of Google Scholar, which allows you to do so easily. Each search result includes a link to a list of all the articles that cite it.
  • Learn how to do effective keyword searches. This skill has become incredibly valuable in the internet age, and nowhere more so than in searching for journal articles. When learning about a new topic, it can be hard to know which keywords to search for, and you should ask for help (see above). Once you know the right words, it can be very important whether search for A and B, A or B, A since year X, B authored by Y, and so forth. Learn how to refine your searches in this way.
  • Learn to rapidly evaluate article titles and abstracts. You can't hope to read all the articles, or even all the abstracts published in your field. Your ability to find the most relevant ones is directly proportional to how quickly you can eliminate the irrelevant. I'm convinced that this skill can only be obtained by experience, but you can accelerate it by noticing articles that you thought would be useful but turned out not to be, as well as becoming aware of who the key authors are in an area.
  • Check for articles in review journals. Most fields have some journals that publish only review articles. Such articles provide a broad overview of a topic along with a detailed bibliography; they are invaluable when starting research on a new topic. In my field, the most relevant are Acta Numerica and SIAM Review. Review articles tend to rank high in search engines because they are heavily cited, but it can be worth searching for them specifically or even browsing review journals that publish a low volume (like the two just mentioned).
  • Check the websites of key authors. You can often find their preprints there long before the published article becomes available. Of course, you don't have time to do this on a large scale, so you have to be selective.

I'm planning a future post that will discuss what to do with all the relevant and significant articles you find.

Friday, October 21, 2011

Springer denies scientist access to her own research

The modern scientific method goes something like this:
  1. Obtain grants to fund your research.
  2. Conduct research.
  3. Write up results of your research.
  4. Submit your written work to a scientific journal.
  5. Sign a copyright transfer giving up all rights to your work.
The last step may sound crazy if you're not an academic, but we usually don't think twice about it. After all, the publisher you're giving the rights to would always give you access to your own work if you needed it.

Right?
Wrong.
The following letter from Dianne O'leary, Professor of Computer Science at the University of Maryland, is reproduced here with permission.

From: Dianne O'Leary
Date: Tue, 11 Oct 2011 13:42:18 -0400
Subject: Rejected Springer reprint request

On September 9, I wrote to Springer asking for a pdf file of one of my
papers:
http://dx.doi.org/10.1023/A:1016614603137
Wang and O'Leary, Adaptive use of iterative methods in
predictor-corrector interior point methods for linear programming
Numerical Algorithms, 25 (2000) 387-406.

It took until October 8 for them to answer my request, and they
decided that I was not entitled to the pdf file of my own paper.

This doesn't seem to be the way to maintain the good will of the
community. They might have the legal right to make this decision, but
it seems to me that it is bad logic and bad business, since they rely
on us to provide, without financial compensation, the content for
their journals and the refereeing of other manuscripts.

My university does not subscribe to this journal -- too expensive --
so I was wondering if anyone had an idea of how I can obtain this pdf
file.

Thanks much.

Dianne O'Leary

One more reason to be careful about the journals you submit to.  SIAM, for instance, allows the author not only unlimited use for personal purposes, but also to post the final version of the article on his/her institutional webpage.
By the way, Prof. O'Leary now has over 200 copies of her article -- so there's no need to indundate her inbox with more.  And apparently someone from Springer has now (on Oct. 19) given in and provided her official access to her article.

Update from Prof. O'leary on Nov. 1:



In response to my posting of trouble getting a pdf file of one of my
Springer-published papers, I received over 200 messages of support and
advice.  It is a great community!

M.J.D. (Mike) Powell was inspired to contact Springer, and in
response, I very promptly received the pdf file (which I have learned
that Springer is willing to supply to every author) and legal
permission to post it on my website (which Springer does not
ordinarily give).  This gives me exactly what I wanted, and I am
grateful.

I had sent my original request to Springer from the website of the
article, clicking the "permissions and reprints" button at
http://www.springerlink.com/content/p158q2276n7u0173/.  Apparently,
this gives the wrong outcome if you are the author.  The people who
processed my request did not forward it to the appropriate person, the
editor, found using the "contact" button on the journal's homepage.

A week after my posting, Claude Brezinski, editor-in-chief of
Numerical Algorithms, wrote me saying that my message might be
interpreted as criticism of him and the editorial board of the
journal.  I meant no such criticism.

Elizabeth Loew of Springer has been very helpful in trying to solve
the problems and clarify the issues.  It is in the current Springer
copyright agreement that authors cannot post the journal pdf files to
their own websites.  Authors are allowed to email the pdf to
colleagues.

As Steve Vavasis noted last week, authors who care about making their
articles more available need to look into mechanisms such as the SPARC
copyright addendum: http://www.arl.org/sparc/author/addendum.shtml

See also the SHERPA/RoMEO site that provides the copyright policy for
many journals: http://www.sherpa.ac.uk/romeo

Friday, October 14, 2011

What journals do you read?

As a scientist, one is defined by the kind of problems one works on, the conferences one attends, the journals one publishes in, and the journals one reads. All of these except the last are more or less publicly available information.

Only you know precisely which journals you choose to read, yet they're an essential part of your scientific identity. They determine the kind of new advances you're likely to be aware of and where your research may turn in the future.

I've made my Mendeley library public, so anyone can see in great detail not only what journals I read but which articles I read. But most of you are probably not interested in quite that level of detail, so here's a list of the journals I follow closely. I collect their RSS feeds (with Google Reader) and read at least the title of every article they publish. I've grouped them into 3 main categories, but otherwise they're in no particular order. Those listed in bold are journals where I have published; they also tend to be the journals most heavily represented in my Mendeley library.

Numerical Analysis and scientific computing:

  • BIT Numerical Mathematics
  • Journal of Scientific Computing
  • SIAM Journal on Scientific Computing (SISC)
  • SIAM Journal on Numerical Analysis (SINUM)
  • Mathematics of Computation
  • Numerische Mathematik
  • Applied Numerical Mathematics (APNUM)
  • Computational Science and Discovery
  • Journal of Computational and Applied Mathematics
  • Journal of Computational Physics (JCP)
  • Computing in Science and Engineering (CiSE)
  • International Journal of Numerical Methods in Fluids (IJNMF)
  • IMA Journal of Numerical Analysis
  • ACM Transactions on Mathematical Software (TOMS)
  • Computer Physics Communications
  • Acta Numerica
  • math.NA on ArXiv

 

I would add Communications in Computational Physics and Computational Methods in Applied Mathematics, but as far as I know they have no RSS feed.

Nonlinear waves:

  • Physica D: Nonlinear Phenomena
  • Nonlinearity

 

Here I would add Communications in the Mathematical Sciences, which also has no RSS feed.

General applied math:

  • SIAM Review
  • SIAM Journal on Applied Math
  • IMA Journal of Applied Mathematics

 

Although ArXiv isn't a journal, I've included it here. Indeed, I find useful articles in that feed much more often than for most of the listed journals.

What does your list look like?

Sunday, August 28, 2011

Academic Work-Life Balance at KAUST

One of the great things about living at KAUST is that there's not much to do.

No, really. I mean it.

For me, one of the most pleasant parts of life on this tiny campus has been the lack of constant activity and busy-ness that surrounded me in Seattle. Like any small town, KAUST takes a slower, more casual approach to life.
That said, there's a danger to being an academic in a place like this. I love my work. Without the thousand constant distractions one finds elsewhere, it's easy to get completely lost in research -- 24 hours a day, 7 days a week. And I believe that, except perhaps for short stints, and despite the fact that sometimes inspiration does strike at 2 AM, such obsession is not a good thing.
This tendency of mine isn't uncommon in academia; for a comical (un)celebration of it, see Uri Alon's Sundays at the lab. Indeed, many people seem to get the impression that you can't have a high-caliber academic research career without sacrificing your personal life, relationships, or sanity.
Though I'm still learning this dance, I can say that for me it's about balance: work hard, play hard, relax and sleep enough. How does one maintain that balance in the microcosm that is KAUST?
The work hard part comes easily enough. Outside of work, my two years here have been a journey toward doing fewer things, but better things with my time. Some of the things that have kept me on an even keel:
  • Spending time with my family. This is the best part of the reduced busy-ness of KAUST. We all have more time for each other.
  • Developing wonderful friendships. Living in a small community forces you to get to know people better. Surviving the initial chaos of living and working inside a construction site was a unique experience that forged some bonds of friendship that will last a lifetime.
  • Starting to play music again. It's great to have a creative outlet that doesn't require the same kind of mental concentration as mathematics.
  • Scuba diving, windsurfing, and yoga. You caught me -- that part about having nothing to do at KAUST wasn't exactly accurate. I grew up a thousand miles from the ocean and I'm discovering for the first time the benefits of living near the (warm!) sea.
Last but not least, I love to take quiet walks on the beach. These are a great way to reconnect with myself and what really matters.
Or to think up better numerical algorithms.
After all, a mathematician's mind is never that far from, well, mathematics.

Thursday, August 18, 2011

The one thing I can't stand about blogger (and Google in general!)

Why, oh why does Google try to figure out what language I speak based on my IP rather than using my browser's language setting or my google account language? Any time I delete cookies and then go to blogger, I'm greeted by this:


The same goes for searching the web with Google -- it's always trying to send me to google.com.sa and give me search results in Arabic. Even if I type 'www.google.com' into the browser bar; it redirects me!
This even though I'm logged into my Google account, which specifies English as my native language:

It's maddening. And there are plenty of pages like this one on Google's own websites where people have pointed out the problem, but to no avail.

A tip for others who have this problem: I go to encrypted.google.com to search. That one's always in English (which also doesn't really make sense). I haven't found a good solution for Blogger, except to try all the available links until I get the right one. If this blog ever disappears, it's probably because I clicked 'flag this blog as abusive' by accident too many times.

Monday, June 27, 2011

How to edit all files containing a particular string

It's often useful to be able to automatically open all files with a particular string (say, to rename a variable throughout your code). To do this with ack, and open all files containing my_string in vim, just type:

vim $(ack -l my_string)

It can be accomplished in a similar way with grep instead of ack.


Edit: If you just want to search and replace a particular string in all files under some directory recursively, use

grep -rl matchstring somedir/ | xargs sed -i "" 's/search string1/search string2/'

It took me a while to find that the double quotes after -i are necessary on Mac OS X. And be aware that the single quotes above usually get mangled to be backticks when copying and pasting.

Edit 2: Don't do this in the root directory of a git repository! It will corrupt the repository.

Saturday, June 25, 2011

More KAUST Beacon photos

The shot of the Beacon in my last post was taken with my iPhone, so last night I went back and got some "real" shots with a tripod and DSLR. Click to see them large, they're better that way.


It's quite a striking structure, up close, especially now that it's fully lit.

Thursday, June 23, 2011

The KAUST Beacon is fully lit

I believe today is the first time:


As a new "House of Wisdom," the University shall be a beacon for peace, hope, and reconciliation...

...this university will become a House of Wisdom to all its peers around the world, a beacon of tolerance.

--King Abdullah

Saturday, June 4, 2011

Mendeley's PDF import has improved dramatically

Looks like the folks at Mendeley have been quietly making some big improvements.

When I first started using Mendeley, I decided to drag a folder of one hundred or so PDFs into it, since it could extract the bibliographic metadata from PDFs. The consequences were disastrous, with a dozen documents named "Society" and all published in "Science". Overall, less than half were correctly imported, and many entries were unrecognizable.

Since then, I don't drag PDFs into Mendeley. This means that bringing in a new document requires a few steps: get the metadata from a journal webpage (automatically using a bookmarklet), download the PDF, and finally associate the PDF to the document.

Today, by accident, I happened to drag a PDF into Mendeley. To my surprise, it was imported perfectly, with all bibliographic data correct. I decided to try another. And a few more. All came in perfectly. I also found that dragging a PDF of a paper already in my library DID NOT create a duplicate.

Thanks, Mendeley developers. Please keep it up.

Tuesday, May 31, 2011

The positivity pipe dream fulfilled?

This post is about my recently accepted SINUM paper. This is an attempt to provide a broad context for the paper and related work in a less formal way, which couldn't be included in a journal publication.

In 1979, Bolley and Crouzeix published a fairly astonishing result (in French here). Namely, they showed that, for linear differential equations whose solution is always positive, it is impossible to design numerical methods that always yield a positive solution under arbitrarily large step sizes, even if one considers the very broad class of ODE solvers known as general linear methods. The only exception to this statement is the backward Euler method, which has many nice properties but is, unfortunately, too inaccurate for most applications.

This result stands in stark contrast to corresponding results on, say, stability in inner-product norms, where use of implicit methods can get you unconditional stability even for nonlinear problems.

I'm speaking roughly here, and won't attempt to be more precise (go read the paper if you want details). But the motivation cited by Bolley and Crouzeix for looking at this question was the idea that one might be able to take large time steps and maintain positivity in the solution of hyperbolic problems. Unfortunately, their result showed that this was not possible unless one was willing to settle for first-order accuracy.

What this result didn't indicate is how large the positivity-preserving step size could be for an implicit method. This question was partially answered within a decade, by Lenferink and by van de Griend & Kraaijevanger, for linear multistep methods and for Runge-Kutta methods, respectively. In both cases they found that the largest positivity-preserving step size was no more than two times the step size allowed by using the forward Euler method.

Of course, an implicit solve costs significantly more than an explicit solve, so gaining only a factor of two in the step size just isn't worth it. I don't know of any more work that was done on this problem after 1991 until fairly recently, in two of my own papers. The results there are consistent with the apparent barrier of a 2X step size, although see my short note at the end.

In one attempt to circumvent this bound, Gottlieb, Macdonald & Ruuth investigated the class of diagonally split Runge-Kutta methods (which aren't general linear methods, so escape the implications of Bolley & Crouzeix). They did find higher order methods with unconditional strong stability properties, but the accuracy of these methods invariably reduced to first order when used with large timesteps! It seemed that any effort to find unconditionally positive methods would be thwarted one way or another.

This sets the stage for my recently accepted SINUM paper. In this paper I found that by using both upwind-biased and downwind-biased discretizations (an idea that goes all the way back to Shu's original paper on "TVD time discretizations") in implicit Runge-Kutta methods, one can obtain second-order accurate methods that preserve positivity under arbitrarily large step sizes -- and they have this property even when applied to nonlinear problems. Remarkably, the methods appear to give quite accurate results when applied to problems with shocks.

It seems that we have gotten around the 2X barrier at last! But important issues remain, most notably the efficient implementation of these "implicit downwind Runge-Kutta schemes" in combination with high order hyperbolic PDE discretizations (like WENO).

I must reiterate that I've glossed over several important technical points here. For more details, go read the paper.

Note 1: In my Ph.D. thesis, I did find a method that breaks the 2X barrier, but only just barely and only for linear problems.

Note 2: I've referred rather carelessly in this post to literature on positivity preservation, contractivity, and strong stability preservation (monotonicity) without distinguishing between the three, simply because the conditions on the method turn out to be the same. The articles mentioned generally focus on one property or the other, but the results almost always carry over to all three.

Sunday, May 29, 2011

How do shock waves behave in inhomogeneous materials?

It's well known that solutions of genuinely nonlinear hyperbolic PDEs lead to shock singularities in finite time, under very weak assumptions on the initial data. However, proofs of this statement invariably assume uniformity of the PDE coefficients in space and time. What if the coefficients are allowed to vary, as would be the case for waves in many real materials, whose properties may be random or periodic?

Surprisingly little is known about the answer to this question, but a first attempt to answer it in part is made in my recently submitted manuscript "Shock dynamics in layered periodic media". Among the "shocking" findings:

-For certain media and relatively general initial conditions, shock formation seems not to occur even after extremely long times.
-Shocks that would be stable in a homogeneous medium are frequently not stable in a heterogeneous medium
-The asymptotic behavior of solutions in heterogeneous media is generally different; rather than consisting of N-waves, the solutions may be composed of solitary waves, for instance.

To get an idea of what's going on, take a look at some movies showing animations of the remarkable behavior of the solution.

Tuesday, May 24, 2011

My favorite new tool: ack

If you're like me, about once a week you type something like

>> grep words

and then wait a few seconds before remembering that you really meant

>> grep words *

or perhaps

>> grep -r words *

If so, do yourself a favor and install ack right now. You'll be glad you did.

Note: ack tries to intelligently search only filetypes that it believes to be text. Unfortunately, its built-in list of extensions for such files is incomplete. For instance, it does not include .rst (ReStructured Text) files. To add more extensions, just create a file ".ackrc" in your home directory and put the following line in it:

--type-add=TYPE=.ext

Here TYPE is the name of the filetype (anything you want) and .ext is the extension of that filetype. For example:

--type-add=RST=.rst

(in the line above, "type" is preceded by two minus characters. Unfortunately, they look strange in my blog's font).

Monday, May 23, 2011

Why I won't be a Mendeley university advisor (for now)

Mendeley, my tool of choice for bibliographic reference management and sharing, just announced their "University Advisor" program. Basically, they're asking academics to advocate for them on campus in exchange for free premium accounts.

I already advocate for Mendeley unofficially, because I think it is useful and its usefulness will grow in proportion to the number of people who adopt it. But I'd rather not be officially associated with Mendeley, mostly because it's not yet a sufficiently polished product.

Mendeley is a useful product with a lot of potential, but it still has a lot of serious problems. Some of these seem like telltale signs that the basic software and data infrastructure on which Mendeley is built has fundamental and dangerous flaws.

For instance, my Mendeley collection has 516 documents according to both the Desktop app and the web app. But the "my library stats" page says I have only 328 articles. This doesn't cause any real problems, but the idea that somehow a count of these items is being maintained in a way that it can get permanently out of sync is disturbing. Furthermore, I had an e-mail exchange with Mendeley support about this a couple of months ago, and they admitted it was a problem but they've been unable to solve it.

Duplicates. This is a huge problem with Mendeley. If I import a document twice, even from the same source, I get duplicates. If I drag a document from my library to a "group" twice, I get duplicates. The latter behavior is really inexcusable, since this operation occurs entirely within Mendeley. It also makes it very painful to use groups (so painful that I've stopped using them). I have a group called "Runge-Kutta stability regions", and I'd like to keep all papers in my library with the tag of the same name in that group. This would be easy if I could just periodically select the tag and drag all papers to the group, but that's a recipe for disaster since I'll end up with duplicates. CiteULike sync is also a disaster, as it generates many duplicates.

Bibtex. Mendeley just doesn't seem to pay enough attention to bibtex-centric user needs. In the desktop app, citation keys are not shown by default. When one right-clicks on a document and selects "copy citation" (with citation style set to bibtex), the citation generated is different than what one gets from "export citation" (the cite keys don't agree). Mendeley mangles bibtex fields on import. When articles are removed from Mendeley, they may persist in the auto-synced bibtex file. Combined with the duplicates problem, this can be a nightmare.

Sharing. Mendeley is supposed to facilitate sharing, but unfortunately it also restricts sharing in one very important way: my library is not publicly accessible, and I cannot make it so. Why not? Do people have something to hide in their library of scholarly references?

I could go on, but you get the idea. Mendeley has a lot of things going for it, not the least of which is a very active development team, and I hope it succeeds in achieving what its developers intend. It's my tool of choice, but as its name indicates, it's still beta.

Thursday, May 19, 2011

What is science?

Today I received an e-mail from a collaborator of mine stating

...our community doesn't reward engineering effort but instead scientific advances.

The context was a discussion of what is publishable in scientific software development. This got me thinking about what really is the difference between science and engineering, and what is the difference between publishable advances and the rest of the work that scientists do.

After some thought I've concluded that this difference is in many respects artificial and purely a function of one's discipline (or even sub-discipline). To take algorithmic complexity as an example:

-To a complexity theorist, a reduction in complexity from 1000N^3 to 3N^2 would be considered "engineering effort": after all, either one is in P, right?

-To an applied mathematician, the above result would be considered a fabulous scientific advance, but a further reduction from 3N^2 to 2N^2 might not be deemed publishable.

-To a chemist, say, who needs to run the algorithm with 10000 different parameter sets, even a 10% improvement might be considered a valuable scientific advance.

Of course, to a pure mathematician any algorithm for solving the problem is an engineering detail; the only scientific aspect is proving the existence of a unique solution.

All of these advances may be important steps leading from an abstract idea to a technological breakthrough that benefits non-scientists. I think it's okay that all the above researchers are only interested in one particular step in this chain of advances. What's detrimental, however, is the inability to recognize that the other steps in this chain are valuable. This kind of bigotry is, in my experience, rather common among scientists.

Edit: Ironically, the quote at the beginning came from a person whose title is "Professor of Engineering".

Monday, February 7, 2011

Python code for making a histogram of your e-mail volume

Here is the source code for the example in my last post. I haven't had time to clean it up, and some parts are not very elegant. But if you want to try it out with your own inbox, all you need to do is change the e-mail address and run it.

Three caveats:
1. If you need to download a large number of e-mail headers, it will take some time (maybe several minutes).
2. It sometimes gets the dates wrong. However, this seems to occur only in a statistically insignificant fraction of cases.
3. Running this will mark all the messages it accesses as read. I'm sure there's a way to avoid this, but haven't had time to track it down.

Sunday, February 6, 2011

Visualizing my inbox load

The other day I happened to notice that I had received well over 100 e-mails in one day.  While that may or may not seem high to you, in my case this meant that I spent most of the day handling e-mails, since the majority of these actually required a response or some other action on my part (I'm organizing two workshops right now, which accounts for much of the traffic).

I thought back to grad school days when I might or might not receive any e-mail on a given day.  When did it all get so crazy?  I decided it would be fun to find out.  A bit of searching turned up the Python package imaplib, which allowed me to download headers for all messages (ever!) from my Gmail account.  Then it was just a matter of extracting and reformatting the dates and plotting up a histogram with matplotlib.  Here's the result:


Can you tell when I graduated and started working for KAUST?  In the last few months prior to starting at KAUST, I got an average of about 250 messages a month.  Within 2 months of starting at KAUST, that average was well over 1000, with some months substantially higher.  Ah, the joys of being a professor...

Tuesday, January 25, 2011

I have a book!

I just got notification that my first book, co-authored with Sigal Gottlieb and Chi-Wang Shu, is finally available for pre-order and will ship by the end of the month.  The book is on strong stability preserving methods (the only existing book on the subject) and is, I think, a nice introduction to the subject.  You can read more about it here.  Apparently it has already sold almost 100 copies.  If you do order a copy, you can get a 20% discount until March with the code "DCL032011".

Thursday, January 13, 2011

nodepy 0.3 available via easy_install

To facilitate my research and perhaps help someone else out there, I develop a python package based around numerical ODE solvers (Runge-Kutta methods, multistep methods, etc.) as objects. The package is called nodepy, and has somewhat limited functionality. However, it contains a very nice implementation of rooted trees, including the ability to compute all the things necessary for deriving order conditions of general linear methods. It also has a lot of nice functionality for Runge-Kutta methods, including a lot of things related to low-storage methods and embedded methods.

As of today, the package is finally available on the PyPI server, and therefore can be installed using

easy_install nodepy

Hopefully this will encourage interested parties to try it out (or better yet, to contribute!)

Thursday, January 6, 2011

Using pylint to clean up Python code

I just recently discovered a very useful package for anyone who writes python code: pylint. It took a little tweaking to get it to do what I wanted. Besides looking for outright errors, it checks all the recommended Python coding style conventions. Since I don't abide by many of those, pylint gave my nodepy code a rating of -4.5/10.0 (yes, that's a NEGATIVE rating) initially. More importantly, I couldn't find the real errors among the thousands of style complaints. To run pylint without checking all the style conventions, just type

pylint -d C xxxx

where "xxxx" is the name of a python package or module. It will still make a lot of subjective judgments about your code (like suggesting that no function should have more than 5 arguments), but to me it's a tolerable level (and sometimes the suggestions really are helpful). More information about pylint's output messages can be found here: http://www.logilab.org/card/pylintfeatures. I was able to uncover several previously unnoticed issues in my package in this way.