Search Results for: archive spark

Searching Tobacco Archives: Sports and Chewing Tobacco

~This post courtesy Allen Smoot, UCSF Archives Intern.

Image from “The case against smokeless tobacco: five facts for the health professional to consider,” September 1980, page 4.

Image from “The case against smokeless tobacco: five facts for the health professional to consider,” September 1980, page 4.

As an intern for the UCSF Archives, I’ve been working on digitized state medical society journals and tobacco control collections. At UCSF, the Archives and the Industry Documents Library both house immense collections of tobacco-related material. In the Industry Documents Library there are millions of documents from tobacco companies about their manufacturing, marketing, and scientific research.  I narrowed in on chewing tobacco and how it became popular in the sporting world. Continue reading

Introducing Garrett Morton

We’re pleased to introduce our other fellow, working this summer on ArchiveSpark and our full-text search tool.

  • Who are you?

My name is Garrett Morton, and I just finished my Master of Science in Information at the University of Michigan School of Information.

  • What’s your background?

As an undergraduate, I majored in history, which is how I got into archives in the first place.  Before going back to school to pursue my master’s degree in information, I worked variously in archival processing, records management, and bookstores, all of which have contributed to who I am now in ways both intuitive and unexpected.  I learned that you don’t have to be a historian to contribute to our collective understanding of our past, but also that helping people ends up being the most fulfilling aspect of anything I do.  During my graduate degree, I have had opportunities to write finding aids for the William L. Clements Library at U-M, teach consulting and contextual inquiry to master’s students, perform program evaluation research, and work on a cross-disciplinary platform design team at Harvard Library.  Over the course of my graduate education, I also realized that I have a strong interest in the technical side of archival and library resources, especially relating to metadata and the computer systems by which we access material.  I devoured all the coursework I could find, and some besides, that allowed me to learn more about programming, systems administration, and metadata creation and maintenance.  I also found classroom and non-classroom opportunities to try my hand at user experience research, turning patron experiences into the actionable building blocks that guide us in making it easier for patrons to achieve their goals.

  • Why are you interested?

Coming at my technical interests without particularly strong prior technical knowledge or experience, it was easy for me to view systems both from the perspective of researchers and patrons who use them and from the perspective of the technical and information professionals who create and maintain them.  I find the problem-solving challenge of designing an app or system to be rewarding and enjoyable, but it is particularly fulfilling to be able to see the tangible beneficial effects on patrons and researchers.  I found this opportunity with the MHL particularly exciting because it offered me a chance to bring a lot of my sometimes-divergent interests together.  Working with these valuable historical collections I could facilitate research, at the same time as using and growing my technical skills, while also applying my knowledge of user experience research to bring these aspects together.  It feels so rare, as a current or recent graduate student, to see an opportunity where you can have such a direct impact on the individual people using an institution’s collection, but that’s exactly what this position at the MHL offers.

  • What do you hope to do?

Over the remaining weeks of this fellowship, I will conduct further interviews with researchers which will help me gain a concrete, empirical understanding of ways in which current tools fall short of researcher needs.  I then hope to build those observable needs into a prototype for a new advanced search tool for the MHL collections.

A Contemporary Take

If you’ve been following medical humanities publishing news lately, you’ve probably noticed the discussion around the forthcoming biography of James Barry by EJ Levy. If you haven’t, you can get a sense of the conversation in this piece from the Guardian.

The MHL doesn’t have much material on Barry, but does have a brief mention in an 1884 publication, Doctors Out of Practice from the Royal College of Surgeons of England.

Fellowships @ the MHL!

We’re looking for one or more fellows to join us working on projects during the summer of 2019. Please share this post widely!

MHLonArchiveSpark Development for the Digital Humanities

DESCRIPTION:

Hosted by one of our member institutions in New York, Boston, New Haven, Philadelphia, or San Francisco, the fellow will develop a user-friendly web interface and author supporting workflows to make MHLonArchiveSpark functionality more broadly accessible to researchers and better facilitate: 1) using the MHL’s Advanced Search Tool to identify a set of texts meeting user criteria and retrieving all of them from the Internet Archive and 2) using ArchiveSpark to extract the full text of a results set (including metadata) for the purpose of performing additional queries against that set. ArchiveSpark is an open source Apache Spark framework for data processing, extraction, and derivation for Web archives and archival collections developed by the Internet Archive and L3S Research Center.

Additional products of this project could include creating a number of canned recipes for searching content using MHLonArchiveSpark and considering new approaches to making extraction and analysis easier.

For more information about ArchiveSpark, visit the following:

The fellowship is paid and may be taken for course credit.

DUTIES AND RESPONSIBILITIES:

  • Based on the input of MHL members and others, assess user needs and propose possible solutions to enhance MHLonArchiveSpark functionality; implement new approach.
  • Create a number of canned recipes for searching content with ArchiveSpark.
  • Create user-friendly documentation for the purposes of increasing the use and reach of MHLonArchiveSpark.

QUALIFICATIONS AND EXPERIENCE:

This position is open to all qualified graduate students with a strong interest in the digital humanities and computer science, including API development, with additional interests in library/information science or education. Strong communication and collaboration skills a must. Fellows are expected to learn quickly and work independently. 

Education and Outreach FellowMedical Heritage Library, Inc.

  • Based on the input of MHL members and others, work on the creation of curated sets of materials drawn from MHL collections.
  • Develop educational materials tied to K-12 and/or university level curriculum
  • Enrich MHL metadata to highlight underrepresented topics in our Internet Archive collections.
  • Regularly create blog posts and other type of social media for posting to MHL accounts.
  • Other duties as assigned.

QUALIFICATIONS AND EXPERIENCE:

This position is open to all qualified graduate students with a strong interest in medical or health history, with additional interests in library/information science or education. Strong communication and collaboration skills a must. Fellows are expected to learn quickly and work independently. 

THE DETAILS

FELLOWSHIP DURATION:

The fellowship will take place anytime between the end of May 2019-mid-August 2019

HOURS:

20 hours per week, over 12 weeks.

SALARY:

$20/hour

To apply, please provide the following:

    Cover letter documenting interest in position

    Curriculum Vitae

    2 References

Please submit your application materials by April 1st, 2019 to:

Attn: Fellowship committee

medicalheritage@gmail.com

Governance

Two images of a human skull, one from the front, one from the side.

The Medical Heritage Library, Inc. is a collaborative digitization and discovery organization committed to providing open access resources in the history of healthcare and the health sciences.

Board

Medical Heritage Library, Inc. is overseen by its board of directors, duly elected by the membership. For more information, please see the full bylaws of the corporation.


Board of Directors 2023-2024

  • President – Beth Lander 
  • Vice-President – Polina Ilieva
  • Treasurer – Melissa Grafe
  • Secretary Mary Yearl

Current Institutional Members

  1. The Cushing/Whitney Medical Library at Yale University
  2. The Francis A. Countway Library of Medicine at Harvard University
  3. National Library of Medicine
  4. Osler Library of the History of Medicine
  5. UCSF Library
  6. Wellcome Library
Front page of Nicholas Culpeper's "Herbal."

Staff

  • Project Co-ordinator (2011-2023) – Hanna Clutterbuck-Cook

Fellows

2018

  • Emma Brennan-Wydra (social media and outreach)

2019

  • Kelly H. Jones (resource sets)
  • Garrett Morton (ArchiveSpark and full-text search tool)

2020

  • Kim Adams (outreach)

2021

  • Rachael Gillibrand (Jaipreet Virdi Fellow in Disability Studies)
  • Aja Lans (education resources)
  • Anthea Skinner (disability studies)

2022

  • Lorna Ebner (LGBTQIA resource set)
  • Genie Yoo (climate, health, and empire resource set)

2023

  • Savannah Flanagan (mental health resource set)

Create

If you are planning a novel research project that could make use of our data, please contact the MHL!

Get Data

You can always extract data from the Medical Heritage Library collection using the Internet Archive’s own advanced search tool. Just make sure you’re searching using the collection tag “medicalheritagelibrary” along with your other search terms.

ArchiveSpark

With the partnership of Helge Holzmann and Vinay Goel at the L3S Research Center and the Internet Archive, the MHL has been working on a tool to allow researchers to take advantage of an Apache Spark framework enables easy data extraction as well as derivation.

The MHLonArchiveSpark tool on GitHub includes all the required elements to work with MHL collections via this framework. Users will need some familiarity with ‘command line’ coding; familiarity with the Scala or Apache Spark commands list is recommended.

Currently, users must have access either to a computer cluster or server or use a Docker container in order to run ArchiveSpark. Using ArchiveSpark in a Docker environment can be done using a laptop or desktop computer. Using Docker will automatically create a Jupyter Notebook where work can be done with the MHL collections.

For those already familiar with Apache Spark environments, an example of the MHL project can be found here. We are working to make ‘recipes’ for searches available for users.

Future Projects

The MHL  is interested in working with researchers and educators to:

  • Gather user stories that illustrate how the MHL corpus (including the UKMHL) has been used to support scholarship and to share those stories on its website
  • Share how MHL content is being used in the classroom and enable students to share how they’ve engaged with MHL content
  • Promote tools created or employed to work with the MHL corpus to create new knowledge
  • Have people experiment with ArchiveSpark and provide feedback for us and the developers, as we’d like to see this become a tool for the less tech savvy to create custom datasets
  • Share their ideas about what tools/services/functionality would improve access to (and the use of) MHL content

Additionally, we have started to reach out to Library Science and other programs with prospective ideas for hackathons and data challenges:

Make ArchiveSpark with MHL more intuitive by developing a user-friendly interface (or other mechanism) for making ArchiveSpark functionality more broadly accessible. MHL constituencies come from a variety of academic disciplines and have varying levels of comfort and familiarity with utilizing tools like ArchiveSpark (https://github.com/helgeho/MHLonArchiveSpark). In a nutshell, this project seeks to make ArchiveSpark workflows broadly accessible to the public, which typically require users to:

    1. Go to the MHL’s Advanced Search Tool to identify a set of texts meeting their criteria and retrieve all of the from the Internet Archive
  1. Use ArchiveSpark to extract the full text of a  results set (including metadata) and then performing additional queries and against that set

Products of this project could include creating a number of canned recipes for searching content
with ArchiveSpark and considering new approaches to searching the dataset for the purpose of extraction and analysis easier for researchers. An online tutorial will be available in advance of the hackathon.

Connect Index Cat to journal articles that have been digitized by the MHL. This challenge involves matching Index Cat entries with full text articles residing in the Medical Heritage Library

The Index-Catalogue of the Library of the Surgeon-General’s Office (Index-Catalogue) is a multi-part printed bibliography or list of items in the Library of the Surgeon-General’s Office, U.S. Army. It contains material dated from the 1400s through 1950 and is an important resource for researchers in the history of medicine, history of science, and for clinical research. The Index-Catalogue was published in five (5) series in sixty-one (61) volumes from 1880 to 1961. Since it is a list of holdings for a specific library, it does not claim to be an index of all material published in medicine. By 1895, however, the Surgeon-General’s Library was the world’s largest medical library. Therefore, its catalog became a major source for accessing medical literature. The scope of Index-Catalogue extends beyond medicine and includes, for example, the basic sciences, scientific research, military medicine, public health, and hospital administration. Language coverage is international with citations in European and Slavic languages, Greek script, and Romanized Chinese and Japanese titles – some with English translations. The catalogue covers a wide assortment of materials including: books, journal articles, dissertations, pamphlets, reports, newspaper clippings, case studies, obituary notices, letters, portraits, as well as rare books and manuscripts (see https://www.nlm.nih.gov/hmd/indexcat/)

The software platform for IndexCatTM is IBM InfoSphere Data Explorer (DE). As installed by NLM, it permits simultaneous searching across all collections in the IndexCatTM database. XML data is available from the IndexCat™ database. It reflects both the Index-Catalogue and eTK/eVK2 collections.The data are available to all both within and outside the United States.  There is no charge for obtaining the files.

Create an index of archaic medical terminology using medical dictionaries found in the Medical Heritage Library, map those terms to contemporary medical terminology (such as the Unified Medical Language System, and index the Medical Heritage Library corpus to facilitate the discovery of published content from the perspective of contemporary medicine. Research using history of medicine primary resources requires a highly specialized vocabulary of medical terms. At the outset of a research project, humanities scholars, behavioral scientists, students of the history of medicine and others may not possess the full assemblage of biological and medical terminology needed to uncover a comprehensive body of primary source material. Even for a researcher who is knowledgeable of archaic medical terminology, the specificity of contemporary medical terms and the increasing degree of specialization within medicine presents barriers to the analysis of an idea or process over time and its impact on society. By applying semantic web technology and the lexical tools of the Unified Medical Language System , we can enable a more lexically open discovery process that supports multi-disciplinary approaches to history of medicine sources. Technical documentation for the UMLS API is available here: https://documentation.uts.nlm.nih.gov/. Alternatively, the scoping of the project could be limited to MeSH subject headings.

Digital Highlights: Healthful Travelling

Title page of "Change of Air."

A trip to “recover one’s health” seems to have been something of a hobby in the nineteenth century. In the United Kingdom, Europe, and America, the health retreat to a spa, a seaside resort, the mountains, or the beach was a reasonably regular occurrence — for those who could afford it, anyway.

In 1831, “physician extraordinary to the King” James Johnson wrote Change of Air, or the Pursuit of Health to reflect not only on the need for such trips but an excursion he had himself taken and “…remarks and speculations on the moral, physical, and medicinal influence of foreign, especially of an Italian climate and residence, in sickness and in health.” (i) Continue reading