Rüschlikon ZH – Researchers at IBM Research in Rüschlikon have developed technology for PDF files based on artificial intelligence. This allows content to be extracted from large volumes of PDFs quickly. It is currently being tested by insurance companies.

According to Adobe, there are roughly 2.5 trillion Portable Document Format (PDF) files currently in circulation. They contain great knowledge in terms of scientific and technical content as well as other important information. However, all this content is “dark or unused”, as IBM highlighted in a press release. Researchers at IBM Research in Rüschlikon have now developed technology that can swiftly identify and extract content from these documents.

The Corpus Conversion Service system can ingest 100,000 PDF pages per day. Artificial intelligence is applied to make content such as text, tables and graphics usable. In this way, required information can quickly be made available from a large number of PDF files.

The Corpus Conversion Service is currently being tested by various external IBM partners, including an insurance company which is using the service to convert unstructured claims. This technology is also being trialled by companies in other industries such as chemicals, oil and gas as well as and consumer electronics.  The solution is set to be launched by the end of this year. It was recently presented at the ACM Conference on Knowledge Discovery and Data Mining (KDD 2018) in London from 19 to 23 August.

IBM Research in Rüschlikon is one IBM’s 12 research labs worldwide and was created in 1956.

Meet with an expansion expert

Our services are free of charge and include:

  • Introduction to key contacts in industry, academia, and government
  • Advice on regulatory framework, taxes, labor, market, and setting up a company
  • Custom-made fact-finding visits, including office and co-working space