Using AI and ML to optimize information discovery in under-utilized, Holocaust-related records

AI & SOCIETY - Tập 37 - Trang 837-858 - 2022
Kirsten Strigel Carter1, Abby Gondek2, William Underwood3, Teddy Randby4, Richard Marciano5
1Franklin D. Roosevelt Presidential Library and Museum, Hyde Park, USA
2Roosevelt Institute and Franklin D. Roosevelt Presidential Library and Museum, Hyde Park, USA
3College of Information Studies, University of Maryland, College Park, USA
4Advanced Information Collaboratory (AIC), University of Maryland, Durham, USA
5Advanced Information Collaboratory (AIC), University of Maryland, College Park, USA

Tóm tắt

Digital cultural assets are often thought to exist in separate spheres based on their two principal points of origin: digitized and born digital. Increasingly, advances in digital curation are blurring this dichotomy, by introducing so-called “collections as data,” which regardless of their origination make cultural assets more amenable to the application of new computational tools and methodologies. This paper brings together archivists, scholars, and technologists to demonstrate computational treatments of digital cultural assets using Artificial Intelligence (AI) and Machine Learning (ML) techniques that can help unlock hard-to-reach archival content. It describes an extended, iterative study applied to digitized and datafied WWII-era records housed at the FDR Presidential Library, rich content that is regrettably under-utilized by scholars examining American responses to the Holocaust. Authors detail the benefits of interdisciplinary collaboration for evaluating user needs, identifying and applying tools and methodologies (including ML through object detection and AI through Named Entity Recognition or NER), and reaching the real-world outcome of public access to augmented data. They also discuss issues of digital representation, relational context, and interface design to enable new modes of public and scholarly access. While based on a case study, we believe that this work is a substantial contribution to revealing the strengths and weaknesses of using AI/ML systems in cultural organizations. We give particular care to lessons learned, and generalize the approach taken across broad classes of collections with a focus on responsive iterations, reproducibility, and the relevance of data and its structures to users.

Tài liệu tham khảo

Adalian RP (2019) Morgenthau, Ambassador Henry, Sr. Encyclopedia entries on the Armenian Genocide. Armenian National Institute. https://www.armenian-genocide.org/morgenthau.html. Accessed 27 Apr 2021 Beschloss M (2002) The conquerors: Roosevelt, Truman, and the destruction of Hitler’s Germany, 1941–1945. Simon & Schuster, New York Breitman R, Lichtman A (2013) FDR and the Jews. Belknap Press, Cambridge Chodorow K, Dirolf M (2010) MongoDB: the definitive guide, 3rd edn. O’Reilly Media, Newton Colavizza G, Blanke T, Jeurgens C, Noordegraaf J (2021) Archives and AI: an overview of current debates and future perspectives. J Comput Cult Herit (JOCCH 20-0191) Spec Issue Comput Arch Sci Cordell R (2020) Machine learning + libraries, a report on the state of the field, commissioned by LC Labs, Library of Congress. https://labs.loc.gov/static/labs/work/reports/Cordell-LOC-ML-report.pdf Diamond I (1941) Treasury Department Inter-Office Communication from Miss Isabella Diamond to Mrs. Henrietta Klotz, 2 August 1941. Di thru Dn, 1940–1942 Folder, Morgenthau Correspondence, Di-Dn [1933]—Diary of HM Jr. [1933–1944] (Corres. Re) Box 77, Morgenthau Papers, FDR Presidential Library and Museum, Hyde Park, NY Diamond I (1943) Memorandum from Miss Isabella Diamond to Mrs. Henrietta Klotz, 17 December 1943. Di thru Dn, 1943–1945 Folder, Morgenthau Correspondence, Di-Dn [1933]—Diary of HM Jr. [1933–1944] (Corres. Re) Box 77, Morgenthau Papers, FDR Presidential Library and Museum, Hyde Park, NY Erbelding R (2015) Morgenthau family papers, 1860–2015, 2015.255.1 Finding Aid. United States Holocaust Memorial Museum, Washington D.C. https://collections.ushmm.org/search/catalog/irn96059 Erbelding R (2018) Rescue board: the untold story of America’s efforts to save the Jews of Europe. Doubleday, New York FDR Presidential Library and Museum (2018a) Finding aid diaries of Henry Morgenthau Jr., April 27, 1933–July 27, 1945. FDR Presidential Library and Museum. http://www.fdrlibrary.marist.edu/archives/collections/franklin/index.php?p=collections/findingaid&id=535. Accessed 27 Apr 2021 FDR Presidential Library and Museum (2018b) Finding aid Henry Morgenthau Jr. papers, 1866–1953. FDR Presidential Library and Museum. http://www.fdrlibrary.marist.edu/archives/collections/franklin/index.php?p=collections/findingaid&id=159&q=&rootcontentid=72431. Accessed 27 Apr 2021 FDR Presidential Library and Museum (2021) Morgenthau Holocaust collections project. https://www.fdrlibrary.org/morgenthauproject Gondek A (2018) Jewish women’s transracial epistemological networks: representations of black women in the African diaspora, 1930–1980. Dissertation, Florida International University Gondek A (2020a) Hidden figures—Henrietta Stein Klotz: “The watchdog of the Secretary of the Treasury.” Forward with Roosevelt. FDR Presidential Library and Museum. https://fdr.blogs.archives.gov/2020/03/24/updates-morgenthau-holocaust-collections-project-2/. Published 24 March 2020. Accessed 27 Apr 2021 Gondek A (2020b) Jewish refugee children and the establishment of the War Refugee Board, 1943–1944, a path through the Morgenthau Diaries and the War Refugee Board Papers. abbysgondek.com. https://abbysgondek.com/portfolio/jewish-refugee-children-and-the-establishment-of-the-war-refugee-board-1943–1944-a-path-through-the-morgenthau-diaries-and-war-refugee-board-papers/. Accessed 27 Apr 2021 Gondek A (2021) Letters from the public in support of a refugee camp at Oswego, NY (1944). Adobe Spark. https://spark.adobe.com/page/xsX0z3D9lMyq1/. Accessed 27 Apr 2021 Hull C (1944) Cordell Hull to ambassadors at Panama, Habana, Ciudad Trujillo, Bogota, Lima, Santiago, Montevideo and Mexico, D.F., Circular Airgram. WRB Series 10, Box 117, History of War Refugee Board with Selected Documents, vol II, Folder 2. FDR Presidential Library and Museum, Hyde Park, p 570 Klotz H (1986) Letter to Henry Morgenthau III from Herman Klotz, 16 October 1986. Morgenthau Family Papers 2015.255.1, Box 32, File 11, Henrietta Klotz File, USHMM, Washington, DC Lee M, Zhang Y, Chen S, Spencer E, Dela Cruz J, Hong H, Marciano R (2017) Heuristics for assessing computational archival science (CAS) research: the case of the human face of big data project. The workshop on computational archival science in IEEE Big Data. Boston, US, pp 2262–2270. https://ai-collaboratory.net/wp-content/uploads/2020/04/Myeong_Lee.pdf Marciano R, Jansen G, Underwood W (2019) Developing a framework to enable collaboration in computational archival science education. SAA 2019 Research Forum McReynolds (1939) Memorandum from Mr. McReynolds to Miss Isabella Diamond, 3 February 1939. Isabella Diamond Folder, Morgenthau Correspondence, Di-Dn [1933]—Diary of HM Jr. [1933–1944] (Corres. Re) Box 77, Morgenthau Papers, FDR Presidential Library and Museum, Hyde Park, NY Morgenthau Jr H (1936) Letter from Henry Morgenthau Jr. to Isabella Diamond, 17 August 1936. Di thru Dn 1933–1936 Folder, Morgenthau Correspondence, Di-Dn [1933]—Diary of HM Jr. [1933–1944] (Corres. Re) Box 77, Morgenthau Papers, FDR Presidential Library and Museum, Hyde Park, NY Morgenthau Jr H (1944) Meeting transcript between HMJ, John Pehle, Josiah Du Bois and Henrietta Klotz to discuss the conversation with Mr. McCloy. Morgenthau Diaries 738:179–180. FDR Presidential Library and Museum, Hyde Park, NY. http://www.fdrlibrary.marist.edu/_resources/images/morg/md1025.pdf Morgenthau Jr H (1945) Handwritten letter from Henry Morgenthau Jr. to Mrs. Henrietta Klotz, 5 August 1945. Morgenthau Family Papers 2015.255.1, Box 32, File 11, Henrietta Klotz File, USHMM, Washington, DC Morgenthau H III (1991) Mostly Morgenthaus: a family history. Ticknor & Fields, New York Morgenthau H Jr, Pehle J, DuBois J (1944) Jewish evacuation meeting transcript. Morgenthau Diaries 707:219–233 Morgenthau Family Papers, specific author unknown (n.d.) Undated Memo to Explain Henry Morgenthau Jr.’s August 5, 1945 Letter to Henrietta Klotz. Morgenthau Family Papers Accession 2015.255.1, Box 32, Folder 11 Henrietta Klotz: pages 184–185 in file. United States Holocaust Memorial Museum, Washington, DC Myer D (1944) Report on emergency refugee shelter. Morgenthau Diaries 779:162–170. FDR Presidential Library, Hyde Park, NY. http://www.fdrlibrary.marist.edu/_resources/images/morg/md1073.pdf New York Times (NYT) Archive (1988) H.S. Klotz, 87, Aide to Treasury Secretary. New York Times Online. 21 December. https://www.nytimes.com/1988/12/21/obituaries/h-s-klotz-87-aide-to-treasury-secretary.html Padilla T, Allen L, Frost H, Potvin S, Russey Roke E, Varner S (2019) Final report—always already computational: collections as data. Zenodo. https://doi.org/10.5281/zenodo.3152935 Paul R (1944) Report to the secretary of the acquiescence of this Government in the murder of the Jews. Morgenthau Diaries 693:212–229. FDR Presidential Library, Hyde Park, NY. http://www.fdrlibrary.marist.edu/_resources/images/morg/md0978.pdf Pehle J (1944a) Report to the War Refugee Board. Morgenthau Diaries 707:235–240. FDR Presidential Library, Hyde Park, NY Pehle J (1944b) Memorandum for the President. Morgenthau Diaries 716:171–174. FDR Presidential Library, Hyde Park, NY Pehle J (1944c) Memorandum for the President. Morgenthau Diaries 726:40–47. FDR Presidential Library, Hyde Park, NY Penkower MN (2016) The Earl Harrison report: its genesis and its significance. Am Jew Arch J LXVIII(1):1–75 Perkel J (2018) Why Jupyter is data scientists’ computational notebook of choice. Nature 563:145–146 (2018). See: https://www.nature.com/articles/d41586-018-07196-1 Qi P, Zhang Y, Zhang Y, Bolton J, Manning CD (2020) Stanza: a python natural language processing toolkit for many human languages. In: Association for computational linguistics (ACL) system demonstrations [pdf] Randby T, Marciano R (2020) Digital curation and machine learning experimentation in archives. In: Computational archival science workshop #5, proceedings 2020 IEEE international conference on big data, 11 December Roosevelt FD (1932) Atlanta, Georgia: Oglethorpe University Commencement Address (speech file 476). FDR’s Papers as President, Master Speech File. FDR Presidential Library, Hyde Park, NY. http://www.fdrlibrary.marist.edu/_resources/images/msf/msf00486 Roosevelt FD (1944a) Executive order 9417: establishing a War Refugee Board. Henry Morgenthau Jr. Diaries 696:1–3. FDR Presidential Library, Hyde Park, NY. http://www.fdrlibrary.marist.edu/_resources/images/morg/md0981.pdf Roosevelt FD (1944b) To the Congress of the United States. Morgenthau Diaries 742:296–298. FDR Presidential Library, Hyde Park, NY. http://www.fdrlibrary.marist.edu/_resources/images/morg/md1030.pdf Strauss A, Corbin J (1998) Basics of qualitative research: techniques and procedures for developing grounded theory, 2nd edn. Sage Publications, Thousand Oaks Tai J, Zavala J, Gabiola J, Brilmyer G, Caswell M (2019) Summoning the ghosts: records as agents in community archives. J Contemp Arch Stud 6, Art. 18:1–20. https://elischolar.library.yale.edu/jcas/vol6/iss1/18