I would like to set up a bot to retrieve email addresses from journal articles of interest. The process would be:
- Query for all articles in the last 7 days that mention my subject of interest (soe) e.g. genome wide association studies.
- Pull out all authors and their email addresses, if provided.
- For missing author address, query for N additional articles by the author, and find those where the author is first or last author, a position most likely to provide and email address. Update the database with the address.
- Send a custom email to each author advertising my product.
For this project I will be using Guile and its http-client
method to screen scrape data. Set up an environment with:
1 | mbc@xps:~/projects/conman$ guix environment --network --expose=/etc/ssl/certs/ --manifest=manifest.scm |
The manifest looks like:
1 | (specifications->manifest |
In one method I obtained an error I had great difficulty debugging. Error is presented in bytecode. Eventually I decided to convert the bytecode to text. Error message:
1 | scheme@(guile-user)> |
To decode:
1 | (use-modules (ice-9 iconv)) |
see https://www.ncbi.nlm.nih.gov/books/NBK25497/ for a discussion of API keys