SingleStone - Can GPT Serve as a Simple Alternative to More Complex Natural Language Search Platforms?

At SingleStone, we’ve been researching how generative AI models can be used to accelerate various tasks within our own business processes and our consulting practice. In this article, we’ll explore how a development team might use OpenAI’s GPT as a simple alternative to more complex natural language search platforms such as Solr, ElasticSearch, or Algolia.

We’ve found that using the tool is technically straightforward, yet getting GPT to reliably extract factual information can be more challenging than expected. While we have not exhausted all possibilities, we believe these results point to a brittleness in GPT and similar models when integrating information contained in prompts.

Scenario

SingleStone has developed an innovative tool, Team Insights, a solution that matches potential team members with projects by evaluating the skillset and availability of each person against the requested skills and start date. As consultants, we use this application to build teams for client projects. For example, a client has requested an engineer experienced in Java for a project starting on May 4th and an expected duration of 3 months. We need to find engineers who can staff that project.

Team Insights is already great for surfacing this type of information, but it would be even better if the person responsible for staffing this project could request it in natural language:

Find someone with Go and AWS experience who is available from May 4th to August 1st

Approach

One of the amazing capabilities of ChatGPT is its ability to parse natural language prompts like this. We asked whether ChatGPT would be able to take this prompt and execute the search required to answer our request.

Below is an example of the data we would like to search: a list of our developers along with their skills.

This data is stored in a relational database. As of this writing, ChatGPT does not directly connect with relational databases as part of the free tier offering. As a workaround, we decided to extract the necessary information into a JSON structure containing the skills of each person and their next availability date:

We aren’t searching natural language data here, but GPT has proven to be quite capable of parsing and explaining code, so we reasoned that it should be able to extract information from JSON. In order to parse a natural language request and search existing structured data for a result, we chose the text-davinci-003 model.

Adding data to the prompt

We added the JSON-formatted version of the data to the prompt:

Figure 1 - OpenAI Playground with Populated Schema

Next, we asked GPT a simple question about skills:

Given the data above, list all people who know Java

GPT responded as follows:

The model was able to understand that it was supposed to look in the JSON data and find people associated with skills. Success, right?

Well, sort of. It found all of the people who know Java, but one who doesn’t. Somewhat surprisingly, the false positive was not JavaScript, which it ignored in the case of Robert R. It incorrectly returned Bryan W., who doesn’t know Java or JavaScript.

Don’t be so clever

We asked the model why Bryan was included, and here is what it told us.

Figure 2 - The OpenAI Model Explanation for Including an Incorrect Candidate

This is an interesting hypothesis. While it is unclear to what extent the model's explanations of its own behavior reflect what is actually happening, we reasoned that the model may have been treating this as a network analysis problem: The person returned as a false positive had many skills in common with those who were correctly returned.

To correct this behavior, we tried adjusting the model's temperature to 0. The temperature controls the extent to which the model gives “creative” answers rather than what it scores as the most likely answer. As the value decreases, the response becomes more deterministic. Because we are looking for a straightforward, factual answer, we tried setting the temperature to zero.

Using the same prompt, we got the opposite of what we were expecting! Rather than selectively returning those persons who have Java listed in their skills, the AI model returns all persons in the data.

Again, we asked the model for an explanation, and here is what it told us:

Figure 3 - The AI Model Differentiates Between Knowledge and Skill

Here, the model is suggesting that it is interpreting the question literally: If you want to know who has Java listed as a skill, then you need to ask who has Java listed as a skill.

OK, Smartypants

We changed the prompt to request all persons who have Java listed as a skill.

J.A.V.A.

Can you do that?

Figure 4 - The AI Model Fails to Recognize Skills

No. The model gives us a few people who do have Java listed as a skill, but it incorrectly left out Owen E., who does know Java. Not helpful.

Employing AI models for business solutions is not easy

While the models provided by OpenAI offer impressive natural language processing capabilities and a striking capacity for what we might call “creativity,” they’re definitely not omni-capable. It’s not realistic to assume that Large Language Models (LLM), such as the GPT family of models, can be utilized to solve any arbitrary business problem without fanfare. Like any sufficiently complex technology, it’s crucial to understand its strengths and weaknesses before they can be effectively integrated into any ecosystem or product.

Through the experiment illustrated in this article, we hope it’s clear that despite GPT’s many positives, it’s not infallible. We’ve illustrated a circumstance where some of GPT’s weaknesses began to show themselves, such as its tendency to respond in seemingly haphazard ways when asked to perform a task that requires a high degree of “attention to detail.”

AI models like the ones provided by OpenAI are objectively impressive, but they’re not magic; they can’t necessarily be used to solve any arbitrary business or technical problem. Identifying the appropriate problem space, utilizing the AI model appropriately, and optimizing it to work in a given scenario are all crucial steps to ensuring success.