Summarizing Large Texts – a Deep Dive into NLP with Bidirectional Encoders
Dienstag, 17. November 2020
Text summarization is a highly useful tool for extracting key information from text, which helps businesses speed up processes dramatically. With the use of bidirectional encoders, such as BERT, RoBERTa or BART, automatic production of human-like summaries has become easier to achieve than ever. For some of the inputs, however, the quadratic time complexity of bidirectional encoders results in a specific token limit, acting as a constraint for processing large text sequences. In her talk, Nataliia will show how to produce high quality summaries for large text inputs, guiding you through the data preparation process and suggesting some heuristics to deal with the token limit. She will compare the performance of different state-of-the-art bidirectional encoders on large text sequences on the example of consumer complaint data and show which architectures ensure producing the best summaries.