Search your course

Develop your own data analyst assistant using Langchain agents

Develop your own data analyst assistant using Langchain agents

Jan. 1, 2024, 4:34 p.m.

When it comes to large language models, what exactly is an agent?

An agent is an application that makes it possible for a large language model to employ resources to accomplish a task.

Up until now, a wide range of activities, including sentiment analysis, translation, summarization, text synthesis, and much more, have been accomplished with language models. Their capacity to create code in several computer languages is one of the most promising features in the technical industry.
In other words, their capacity to comprehend and produce code allows them to interact not only with databases, operating systems, libraries, and APIs, but also with humans through natural language interpretation and generation. They can invoke popular APIs and generate code in Python, JavaScript, and SQL.

Langchain Framework Components:

The several essential parts of LangChain's overall structure are as follows:

  • LangChain Libraries: The foundation of the framework is made up of Python and JavaScript libraries. A wide variety of component interfaces and integrations are included in these libraries. They offer a fundamental runtime for chaining and assembling these parts into agents and provide readily usable off-the-shelf chains and agent implementations.

  • Templates for LangChain: A group of reference architectures that are simple to use for different kinds of tasks. By providing developers with a starting point, these templates streamline the process of developing applications.

  • LangServe: A REST API deployment library for LangChain chains allows LangChain applications to be seamlessly integrated into current services or systems.

  • LangSmith: A platform for developers that makes debugging, testing, assessing, and keeping track of chains easier. Integrates seamlessly with any Large Language Model (LLM) framework, giving programmers a stable environment in which to refine their programs.

Can LLMs replace Data Analysts?

Over the course of the last year, I believe we have all questioned whether or not ChatGPT will be able to take your position. The latest advances in generative AI are widely acknowledged to have a significant impact on our personal and professional lives. How our jobs will evolve over time is not yet evident, though. Although thinking about many future events and their likelihoods may be fascinating, I propose a completely different strategy: attempting to construct your own prototype. First of all, it's enjoyable and challenging. It will also assist us in taking a more methodical approach to how we see our work. Additionally, it will allow us to use one of the most cutting-edge approaches—LLM agents.

What is data analytics?

Before moving on to the LLMs, let’s try defining what analytics is and what tasks we do as analysts.

Four different approaches for data and analytics

  • Descriptive analytics: provides answers to queries such as "What happened?" What was the revenue in December, for instance? This strategy involves using BI tools and completing reporting chores.

  • Diagnostic analytics: Going a step further, diagnostic analytics poses queries such as "Why did something happen?" For instance, why did revenue decline by 10% from the year before? This method necessitates additional data slicing, dicing, and drill-down.

  • Predictive analytics: We can get the answers to queries like "What will happen?" thanks to predictive analytics. The two main components of this strategy are simulation (which models several possible outcomes) and forecasting (which projects the future for scenarios that are typical).

  • Prescriptive analytics: The ultimate decisions are impacted by prescriptive analytics. "What should we focus on?" and "How could we increase volume by 10%?" are frequent queries.

Use LangChain to generate an LLM-Agent:

We are going to work with LangChain to develop an agent that can evaluate data from an Excel file via the OpenAI API. In order to generate predictions for the future, it will be able to identify correlations between variables, clean up the data, look for a model, and run it.
To put it briefly, it will support us in our daily activities by acting as a Data Scientist Assistant.

Installing and importing libraries:

!pip install langchain
!pip install langchain_experimental

!pip install --upgrade openai==0.28.1
!pip install tabulate
!pip install xformers

We use the OS library to store environmental variables. Like OPENAI_API_KEY.

Get your OpenAI API Key: https://platform.openai.com/

import os
os.environ["OPENAI_API_KEY"] = "your-api-key"

This is the easiest Agent we can create with LangChain, we only need to import the create_pandas_dataframe_agent.

from langchain.llms import OpenAI

from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent


from langchain.agents import (
    AgentType, #ZERO_SHOT_REACT_DESCRIPTION default value
)

Loading the Data:

import pandas as pd

You can download the CSV. Feel free to use the Dataset you are more interested In, or your own Data.

csv_file='/content/drive/MyDrive/Data/climate_change_data.csv'
creating the document with Pandas.
document = pd.read_csv(csv_file)

Time to create our little assistant, and we need only a call.

We let OpenAI decide which model to use. However, we specify a temperature value of 0 to its parameter, so that it is not imaginative. This is much better when we want the model to be able to give commands to the different libraries it can use.

litte_ds = create_pandas_dataframe_agent(
    OpenAI(temperature=0), document, verbose=True
)

In the first question ask for some trends and a conclusion.

litte_ds.run("Analyze this data, and tell me if you see any trends. \
give me a conclussion with the principal trend")

In the second question I Ask for correlations.

litte_ds.run("Do you see any correlations in the data? If yes tell me the principal.")

The third question is the most difficult.

The agent must select and algorithm, and forecast the Data with this algorithm. As final task I request a Graphic.

litte_ds.run("First clean the data, no null values and prepare to use it in a Machine Leaninrg Model. \
Then decide which model is better to forecast the temperature \
Tell me the decision and use this kind of model to forecast the temperature for the next 15 years \
create a bar graph with the 15 temperatures forecasted.")

 

Conclusion:

This is one of the most effective agents that is also the simplest to use. We've shown how we might have an agent that could follow our directions and use as little as a few lines of code to clean, analyze, and create charts out of our data. Furthermore, it has the ability to make inferences and even determine which algorithm was most effective in forecasting the data.
The field of agents is still in its early stages, and numerous companies like Hugging Face, Microsoft, and Google are joining it. With new language models and new tools, their possibilities are expanding as well.
We cannot afford to miss this revolution, which will bring about many changes.