2019
Look, I read two of the latest newletters by Sacha Chua and I already learned
about two new Org features: org-reverse-datetree and org-bib-template. Moreover,
I didn’t know that there were such thing as meta repository for ESS users. #emacs
I think this is the first time this site is referenced in Sacha Chua excellent Emacs newsletter.
Mathematica implementations of machine learning algorithms used for prediction and personalization.
This open source project is for Mathematica implementations of statistical and machine learning algorithms that can be used for data analysis, prediction, and recommendation systems.
Note that the Github repository also includes Lua, Java and R code. The companion website is Mathematica for prediction algorithms.
A few days ago, I read a thread on Biostars (which I haven’t consulted in a while) on the use of Wolfram mathematica in bioinformatics, and I wondered why people are so critical of this software. The same applies to Stata (if you see the recent flame on Twitter, you know what I mean), albeit in this case there’s not even this man behind it.
Long time no see. I have been compiling several pieces of bioinformatics software lately. No issues whatsoever, except for a few glitch with boost libraries.
I just added permalinks in this section (here, a small hash symbol near the date). I was missing a way to link to previous micro-posts.
I’m almost done with Occupied. I initially thought I would be able to finish the last two episodes of the first season this evening, but I’m so tired (I’m up since 4am) that I’m afraid I won’t be able to stand up for long.
I am still unsure how best to use org-journal. I already use a “diary” file
where I bookmark important stages of my working day. This way, I get a nice
summary with org-agenda. Obviously, I could do exactly the same using
org-journal, but I was thinking that it could also be used to record my posts on
the main site: (1) I would be writing using Org mode directly, (2) I would get a
searchable archive from Emacs directly (and more convenient than deft), and (3)
that would be just cool. #emacs
How to delete empty lines in a file by Emacs? Useful to clean up an HTML page
with lot of extra blank lines. #emacs
M-x flush-lines RET ^[[:space:]]*$ RET
This moment when you realize that you are stuck with Java 8 on your OS… Two
options: use Homebrew (brew cask install java) or proceed manually. I think I
will love bioinformatics tools.
Ecological causes of uneven diversification and richness in the mammal tree of
life. (via @rlmcelreath) #biorxiv
It’s been a while since I haven’t run any ML model using caret, especially since
Max Kuhn engaged in the RStudio team to develop a brand new ML pipeline in the
name of the tidy new wave: tidymodels, then parsnip (slides near here). Anyway,
here is a good tutorial if you want to get started with caret. (via
@R_Programming) #rstats
And we are finally done with The 100. Looking forward to looking to The Expanse during winter holidays.
When you insist on your CLI-based workflow (reproducibility, text-based, etc.
you know…) and you realize that Stata 13 does not recognize graph export
with a PDF backend (while Stata 15 does) from a Terminal. Back to Encapsulated
PostScript then, like in the 90s! #stata
While I appreciate that there are so useful Docker images available, I think I
will need to build a more lightweight one if I want to stay on CircleCI free
plan. Hopefully, it looks like someone already had the same idea. #rstats
The first edition of Interpretable Machine Learning is out. (via @ChristophMolnar)
Emacs build-status: a nice package that allows to monitor build on Travis or
CircleCI. #emacs
Yet another org-powered website. This makes me think that I added a little
org-capture template to write those micro-posts without having to open my
micro.org file. #org
("b" "Blog post" entry (file+headline "~/org/micro.org" "Micro")
"** TODO %?\n:PROPERTIES:\n:EXPORT_FILE_NAME:\n:END:\n%^g\n"
:empty-lines 1)
Stephen Wolfram reflecting on his “productive” and digital life. What a man!
Despite the useful utility under the “File” menu, my attempt at installing a
Mathematica package properly failed miserably this morning. I ended up
copying/pasting the wole archive into ~/Library/Mathematica/Applications.
Anyway, this worked and I am now able to plot phylogenetic trees!
Didn’t know there was such a thing: MacJournal (via Jack Baty). Whether you are interested in this app or not, the author provides a nice discussion of the pros and cons of keeping a diary vs. a journal, and on the importance of meta data.
Merlin Mann et Marie Kondō sont dans une d’emails, by Bastien Guerry. Nice summary of the situation regarding emails. I already deleted 30k+ mails in one pass so I know what batch processing is.
Discrete Stochastic Processes. It’s amazing how many excellent tutorials can be found on the MIT OpenCourseWare.
I disabled Dropbox syncing on my Mac for a long time now, but I realized yesterday that Transmit allows to connect to Dropbox very easily now. Even if I no longer use Dropbox these days, that may be a very good option for the future.
After jupyter-book, there is now jupytext (via @marcwouts). Looks like we now
have a serious competitor to RStudio. #python
Nick Cave & The Bad Seeds, Nocturama. I’m often lazy when it comes to changing a CD.
Two handy org commands: org-journal-new-scheduled-entry can be used to schedule
future entries in org-journal (see discussion here); org-tree-to-indirect-buffer
is a good alternative to org-narrow-to-subtree sometimes. #org
Diving into computational molecular biology. It’s a fun world after all,
especially compared to medical statistics. I am trying to devise a reliable
workflow for taking notes and using a live notebook, mostly inspired from my old
setup, but basically it’s all about Org files with tags and “TODO items”,
including a diary and helm-bibtex for managing my bibliography. Nothing fancy,
but it just has to do the job right after all.
Pretty Magit - Integrating commit leaders. I have been using Git leaders for almost two years, but now I realize that I completely forgot about them.
Today was my first day at my new lab. Everything went fine, despite a very bad
night. At least I have been able to go back home without too much dizziness or
paresthesia in the legs (I don’t know where this one comes from). Guess what:
For the first time in 10 years, I am able to connect my Macbook on the network!
#self
I am reading the Racket guide again, this time using Dash only. It’s amazing how
convenient this application is, especially for navigating between text and
function definitions, which by default are all hyperlinked thanks to the
Scribble documentation system. #scheme
I am about to exceed the 150th micro-posts in my Org file. (Other posts are
published from the terminal directly.) I added a little cookie to keep track of
the number of entries, although a little harder path would be to write some
elisp code. #org
I don’t have any big needs in terms of image processing, and I am generally happy with ImageMagick. However, Acorn and Retrobatch (h/t Brett Terpstra) look pretty nice.
Just cleanup a little bit more my Dropbox (6 Go of data, reports and papers accumulated along 8 years!).
Machine learning in Clojure with XGBoost. Note that there are bindings for the
awesome xgboost in various other languages (Python, Julia, R), not just the JVM.
#clojure
Python didn’t become the leader in the field because it’s inherently better or more performant, but because of scikit-learn, pandas and so on. While as Clojurists we don’t really need pandas (dataframes) or similar stuff (everything is just a map, or if you care more about memory and performance a record) we don’t have something like scikit-learn that makes really easy to train many kind of machine learning models and somewhat easier to deploy them.
merlin - a unified framework for data-analysis, and many other interesting
packages by the same author or other coworker. #stata
Again, I’m slowly updating stata-sk. It took me a while to reset the publishing
system to use Stata 13 MP instead of Stata 15 since I no longer get a free
license for it. This will probably be my last textbook on Stata. #stata
Look. Even Racket has some support for statistical data structure like data
frames. In addition, here is an essential read if you want to get started with
common data structures: An Overview of Common Racket Data Structures. #scheme
An analysis of lossless data compression programs: Large Text Compression Benchmark. (via SO–it looks it is the very first question on the beta site)
The amount of genomic sequence data being generated and made available through public databases continues to increase at an ever-expanding rate. Downloading, copying, sharing and manipulating these large datasets are becoming difficult and time consuming for researchers. We need to consider using advanced compression techniques as part of a standard data format for genomic data. The inherent structure of genome data allows for more efficient lossless compression than can be obtained through the use of generic compression programs. We apply a series of techniques to James Watson’s genome that in combination reduce it to a mere 4MB, small enough to be sent as an email attachment. – Human genomes as email attachments