Word cloud figure from LaTeX index entries

I created the word cloud on the cover of “Learn R as a Language” using an R script that takes as input the file for the book index, as generated when creating the PDF from the LaTeX source files. This input file contained quite a lot of additional information, like font changes and page numbers that needed to be stripped into a clean list of words. Only later I realized that it would have been easier to produce a cleaner word list to start with. So, I first present the code revised to work with a simpler word list. This is actually tested with the book files to work. If you want to do something similar for your own book, follow the revised code in first section below. If you want to see the “hacked-up” code I really used for the cover as included in the book, it is in the second section below.

Continue reading