MATERIAL

Machine Translation for English Retrieval of Information in Any Language

Intelligence Value

The MATERIAL program aims to revolutionize the way the IC consumes foreign language information, by turning multilingual text and speech media into useable intelligence information for analysts, regardless of their language expertise.

Summary

A large portion of the ever-increasing amounts of text, audio, and video data produced in today’s world is being generated by populations of emerging importance in lower-resource languages. This rich source of data is of little value if the information cannot be effectively searched. Launched in October 2017, The MATERIAL program is a 47-month venture seeking to address this challenge by building robust, automated language capabilities with limited linguistic resources, expertise, and tools.

MATERIAL’s ultimate goal is to build a Cross-Language Information Retrieval (CLIR) systems that find speech and text content in diverse lower-resource languages, using English search queries. This system will allow analysts to submit queries in English and receive short English summaries of relevant foreign language items that saliently display relevance to their information needs.

  • Success is measured by a novel end-to-end retrieval metric that will assess the system’s ability to retrieve all relevant documents, while producing few false alarms. The MATERIAL program will provide:
  • State-of-the-art Automatic Speech Recognition (ASR) and Machine Translation (MT) systems and models for Tagalog, Swahili, Somali, Lithuanian, Bulgarian, Pashto, Farsi, Kazakh, and Georgian
  • Highly competitive models optimized for informal and formal speech and text • Novel end-to-end CLIR systems available with dockerized, exchangeable component technologies
  • Innovative ways of utilizing large amounts of diverse multilingual unstructured text and speech data to improve model performance
  • Text processing tools to address morphology and divergent spelling
  • Text, audio, and video data crawlers operationalized for U.S. government use
  • Annotated, reusable datasets in multiple languages for CLIR, ASR, and MT research

Publicly Available Program Artifacts

For an 11x17 PDF of the listings of publicly available resources released by MATERIAL Performers, click on the image below.

BETTER ProgramArtifacts Page 1 sm

Related Publications

To access MATERIAL program-related publications, please visit Google Scholar.

MATERIAL Logo

Contact Information

Program Manager

Main Office

Related Program(s)

Broad Agency Announcement (BAA)

Link(s) to BAA

IARPA-BAA-16-11

Solicitation Status

CLOSED

Proposers' Day Date

September 27, 2016

BAA Release Date

January 19, 2017

BAA Question Period

January 19, 2017 — February 20, 2017

Proposal Due Date

March 20, 2017

Program Summary

Testing and Evaluation Partners

  • Massachusetts Institute of Technology Lincoln Laboratory
  • National Institute of Standards and Technology
  • University of Maryland Applied Research Laboratory for Intelligence and Security
  • Tarragon Consulting Corporation

Prime Performers

  • Johns Hopkins University
  • Raytheon BBN Technologies
  • Columbia University
  • University of Southern California Information Sciences Institute