In today’s data-driven world, the ability to harness advanced language models is crucial for organizations aiming to leverage their data for actionable insights. While many organizations excel at managing and analyzing structured data, they often overlook a treasure trove of information contained in unstructured data. This unstructured data—ranging from emails and social media posts to customer reviews and documents—holds significant insights that can drive strategic decisions and enhance customer experiences.
Enter Snowflake Cortex, an innovative addition to the Snowflake ecosystem that brings the power of large language models (LLMs) right to your data warehouse. By integrating advanced AI capabilities, Cortex enables organizations to unlock insights from both structured and unstructured data, seamlessly bridging the gap between these two worlds. In this blog post, we’ll explore the diverse functions and features of Snowflake Cortex LLM, highlighting how it can transform the way you interact with data.
Use case
You work as a data engineer in the IT team for Frosty Adventures, an internationally active B2B company selling winter gear and equipment. Your team lead was approached by the manager of the customer support team. The manager didn’t complain about the data quality for once (lucky you!), however they mentioned that customers frequently use their team’s new chat bot, which generates a vast amount of text data. These chat bot transcripts are an important source of information for gauging customer satisfaction levels.
However, they are analyzed manually by the customer support team right now, which is an annoying and tedious process. The manager heard about the recent AI boom and was wondering if the IT team had some knowledge on this topic in order to streamline the transcript analysis. You suddenly remembered that Snowflake recently introduced their own machine learning and artificial intelligence module, called Cortex. It would be a good idea to use this as the company’s data is already stored on Snowflake, so no third party tool is needed, avoiding any new complex integrations and decreasing the security risks as there are no new potential entry points to the company’s data. You walk up to your team lead and present your idea to use Cortex. They think about it for a while and are not convinced of the efficacy of the product. However, if you can come up with some good cases, they will consider presenting the results to the manager of the customer support team. Essentially giving you the green light, you start working on a small subset of the chat bot transcript data.
Sample data
In this use case, the data contains more than only the chat bot transcripts. It also contains columns like: PRODUCT, CATEGORY and DAMAGE_TYPE, which already contains some of the information that we probably want to extract from the transcripts. These columns were not used in building the LLM, they were merely used as validation for the LLM outputs.
Translate
As stated before, our company is internationally active and receives invoices in a wide range of languages. Here, we only consider customer support chat bot transcripts in three different languages: English, French and German. The customer support team has an issue with processing these customer support requests. Not everybody in the team is proficient in either French or German, so these transcripts can only be analyzed by a few people in the team, creating a bottleneck. However since our company’s headquarters is located in an English speaking country, everybody is proficient in English of course. Translating these transcripts to English would remove this bottleneck in the customer support team and make the analyzing process more efficient. Furthermore, it would also benefit us, the data engineers, as all other Cortex features only work with English text. Fortunately, Cortex has a handy built-in function that automatically translates the French and German transcripts to English.
Example
Use Cortex function TRANSLATE to translate any text from and to: English (en), French (fr), German (de), Italian (it), Japanese (ja), Korean (ko), Polish (pl), Portuguese (pt), Russian (ru), Spanish (es), Swedish (sv). For example, the query select snowflake.cortex.translate(‘Wie geht es dir heute?’,’de’,’en’) returns “How are you doing today?”
Summarize
Translating the transcripts already worked wonders for the efficiency of the analyzing process, but we can always do better. Sometimes a customer can be very vocal with expressing their problem, creating more of an epic instead of a regular conversation. Reading through these kind of transcripts can be very long-winded and has a significant impact on the efficiency of the analyzing process. Summarizing each transcript in a few sentences without loss of any critical information would be a nice-to-have for the customer support team, speeding up their analysis.
Example
Use Cortex function SUMMARIZE to give a summary of any given English text. The following piece of text is a summary of an example transcript where a customer received helmets with broken loops and asked for an exchange of products.
“Customer Pierre Bouchard of SnowSport Centre received defective XtremeX helmets with broken loops from the agent. The agent apologized and promised to send a new shipment of functional helmets within 4 working days. Customer was satisfied with the solution.”
Sentiment
After successfully improving the analyzing and decision-making process of the customer support team, we caught the attention of the sales manager. They are interested in the general customer satisfaction level of our company’s products, which includes the way customers feel when going through a customer support process. The sales manager would like to know the customer’s emotional response during the customer support process. In case a customer shows negative feelings in their speech, the sales manager would like to quickly respond to this by providing an additional rebate on the customer’s next purchase for example, minimizing the customer churn rate.
Example
Use Cortex function SENTIMENT to gauge the sentiment of an English text. The function returns a number between -1 and 1 as result where a negative number represents a rather negative sentiment, a positive number represents a rather positive sentiment and 0 represents a purely neutral sentiment. For example, the query select snowflake.cortex.sentiment(‘It is really annoying that we received broken helmets, we need an urgent replacement!’) returns -0,5812153.
Classification
We successfully identified unhappy customers in our chat bot transcripts and the sales manager starts to believe the impact of AI on business processes. However, they are not fully convinced yet. For our next challenge, the sales manager asked us if we could gain insights in how much customers asked for a refund or an exchange of products using only the chat bot transcripts. Fortunately, this feature is also available in Snowflake Cortex and we managed to successfully classify the transcripts.
Example
Use Cortex function CLASSIFY_TEXT to label an English text in one of the given groups. For example, the query select snowflake.cortex.classify_text(‘I would like to be compensated for the broken helmets.’, [‘Refund’, ‘Exchange’]) returns {“label”: “Refund”}.
Extract answers
Analyzing and streamlining the customer support process is nice, but the sales manager doesn’t want to react to complaints about defective products all the time. The dream would be to have no complaints at all. However, the sales manager is also happy with trying to minimize the defects. They would like to have an idea of which product breaks down the most. Luckily, we can extract the answer to this question from the transcripts relatively easily with Cortex.
Next to the extracted answer, we get a score of how certain Cortex is that the extracted answer is a good answer to the asked question. As you can see, it’s not that certain for the 2nd and 3rd answers, which are relatively long. After analyzing the answers and classifying them with a query, we were able to answer the sales manager’s question.
Example
Use Cortex function EXTRACT_ANSWER to extract an answer from text for a specific question. For example, the query snowflake.cortex.extract_answer(snowflake_review, ‘What Snowflake technology does this review mention?’) returns “cortex llm”.
Prompting
It’s also possible to use the capabilities of a LLM to ask general questions about the data and extract a customized answer, like the more well-known ChatGPT.
The sales manager would like to have better insights and asked the following questions to us:
- Which customers with defective products contacted us?
- Which products were defective?
- What was their defect?
- Did the customer ask a compensation in the form of an exchange or refund of products?
- What was the general impression of the customer? Was the customer patient and understanding or were they rather unhappy with the incident?
Analyzing each chat bot transcript and manually extracting these answers is rather tedious and annoying. With Cortex, we can prompt one of its many LLMs to extract the answers for us directly in a SQL statement, facilitating loading the data into a table.
After parsing the JSON, the result looks like this.
Example
Use Cortex function EXTRACT_ANSWER to extract an answer from text for a specific question. For example, the query snowflake.cortex.extract_answer(snowflake_review, ‘What Snowflake technology does this review mention?’) returns “cortex llm”.
Conclusion
Snowflake Cortex LLM functions enhance data processing and analytics capabilities by integrating advanced machine learning directly within the Snowflake environment. As is shown in this blog post, these functions streamline workflows, enabling users to leverage natural language processing for data queries, generate insights, and automate tasks. The Snowflake Cortex LLM functions do wonders when it comes to automating relatively simple tasks, such as translating, summarizing and classifying text.
It struggled a little bit while gauging the overall customer sentiment, but this can also be due to the transcripts themselves. Most transcripts start with an angry customer, but end with the customer being more calm when the problem is resolved, leading to a confusing state where the customer is feeling neither positive or negative. It struggled the most with extracting an answer to a specific question, taking too much text sometimes as the answer (e.g. not only extracting “helmet” but rather “helmet with broken loops and scratches on the side”).
However, this can also be due to the structure of the underlying text data. In this case, it was better to use the COMPLETE function, which leverages the power of more known LLMs and returns more accurate results.
Overall, using Snowflake Cortex LLM functions is a game changer when it comes to processing natural language in a timely and cost-efficient manner. Furthermore, it is seamlessly integrated within the Snowflake ecosystem, avoiding the integration of third party tools.