Skip to main content
Opgeslagen

Master thesis project on Research on Confidence scoring for LLM answers



Solliciteer

We propose a Master thesis project on the topic of obtaining Confidence Scores for answers generated by Large Language Models (LLM), in particular for answer generation using Retrieval Augmented Generation pipelines.

Project:

Generative AI models like Large Language Models (LLMs) generate answers with varying levels of accuracy, giving factually inaccurate answers in some cases. As a result, validation, and evaluation of LLM results is an emerging field of interest to many users, developers and researchers. At ING Wholesale Banking Analytics (WBA) we are interested to research and develop techniques that make it possible to calculate confidence scores for LLM answers provided to question-prompts.

At ING bank we deal with a lot of documents and are doing multiple Generative AI projects to help process those efficiently, based on Retrieval Augmented Generation (RAG). A RAG pipeline couples a search engine to an LLM. This allows one to ask questions to documents and retrieve answers, known as generative question-answering (QA). Specifically: QA by the LLM based on relevant text passages retrieved from a document. Generated answers to questions are grounded by the retrieved text, thereby severely reducing the risk of hallucinations. The RAG projects aim to automate the extraction of information from unstructured documents. Typically, a fixed set of questions needs to be answered for a large batch of similar documents. We use the answers for automated form-filling, resulting in a structured summary dataset.

The problem at hand: are the generated answers reliable? Normally LLMs do not return confidence scores for generated answers, and these answers are not necessarily correct. (LLMs are not designed to do so.) The proposal is to research and develop a reliable confidence score that can be applied in (one of) ING's data extraction projects. To do so we shall use ground-truth datasets that have been manually labelled by expert analysts.

Research considerations:

  • The availability or non-availability of network weight and next-token probabilities in popular, commercial models such as ChatGPT and OpenAI.
  • How to account for the random component in generated answers. 
  • Multi-class answers. Multiple answers can be correct to the same question, for example extracted from different document pages.

Other details:

The student will collaborate with data scientists and subject matter experts working on data extraction.

Reference blog from our team:

https://medium.com/p/c668844d52c8

The aim is to apply the research to real-world use-cases that we have in our department. We have a RAG setup where we need to extract answers to a set of questions from a large set of documents.
The intern will be asked to study the latest research developments on confidence scores for LLMs, adjust it where needed for application to our practical use-cases, and define and build a hands-on prototype.

The team

The Wholesale Banking Advanced Analytics team is a large team of data scientists, data engineers, software developers and many more, that are focused on bringing data, machine learning and statistical modeling into the products that we build for our clients or internal users. The data scientists in WBAA furthermore have a strong desire to keep up with and be part of the latest developments in the fields of AI, tooling and statistics. Which they do by working closely together with master’s students on a variety of topics to solve academic yet practical problems.

Our team has extensive experience with student supervision.

How to succeed

We hire smart people like you for your potential. Our biggest expectation is that you’ll stay curious. Keep learning. Take on responsibility. In return, we’ll back you to develop into an even more awesome version of yourself.

Are you a Master student looking for a thesis project and are you interested in this one? Do you furthermore:

  • Have solid experience with Python
  • Have machine learning experience
  • Have solid skills in statistics and linear algebra (matrix rank, singular values, matrix decomposition, …)
  • Get at least six months to do your thesis project
  • Aim to go for a publication
  • Bring good vibes to your fellow data scientists

Rewards and benefits

This is a great opportunity to train with highly skilled people who are experts in their field. You’ll do a lot and learn a lot – not only about your specialist area and the bank, but also about yourself and whether this type of environment is right for you.

You’ll also benefit from:

• Internship allowance of 700 EUR based on a 36-hour work week

• Your own work laptop

• Hybrid working to blend home working for focus and office working for collaboration and co-creation

• Personal growth and challenging work with endless possibilities

• An informal working environment with innovative colleagues

During the duration of your internship at ING, it is mandatory to be enrolled at a Dutch university (or EU-university for EU passport holders).

Questions?

Contact the recruiter attached to the advertisement. Want to apply directly? Please upload your CV and motivation letter by clicking the ‘Apply’ button.

About our internships

Every year, more than 350 students join our internship program. While there are no guarantees about your future, many of our former interns move into a permanent role or onto our International Talent Programme (traineeship).

Whatever happens, an internship at ING is the ideal opportunity to meet a wide variety of people, to build up your own network, and to learn about many different aspects of banking – put simply, it’s a great start to your career.

Solliciteer
Your place of work Explore the area

Vragen? Stel ze aan
Nóra Sütő

Solliciteer

Bij ING willen we het beste uit mensen halen. Daarom hebben we een inclusieve cultuur waarin iedereen de kans krijgt om te groeien en een verschil te maken voor onze klanten en de samenleving. Diversiteit, gelijkheid en inclusie staan bij ons altijd voorop. We behandelen iedereen eerlijk, ongeacht leeftijd, geslacht, genderidentiteit, culturele achtergrond, ervaring, geloof, ras, etniciteit, beperking, gezinssituatie, seksuele geaardheid, sociale afkomst of wat dan ook. Heb je hulp nodig of kunnen we iets voor je doen tijdens je sollicitatie of gesprek? Neem dan contact op met de recruiter die bij de vacature vermeld staat. We werken graag samen met jou om het proces eerlijk en toegankelijk te maken. Lees hier meer over hoe wij staan voor diversiteit, inclusie en erbij horen.

Meer voor jou

De nieuwste vacatures direct in je inbox

Geïnteresseerd inZoek op categorie en kies er één uit de lijst suggesties. Zoek op plaats en kies er één uit de lijst suggesties. Tenslotte klikt u op "Toevoegen" om uw bericht over nieuwe banen aan te maken.

By submitting your information, you acknowledge that you have read our privacy policy and consent to receive email communication from ING.