Tuesday, August 1, 2023

Case Study: PDFGPT, Exploring the Structure of a Large Language Model (LLM) System

PDF GPT: An Illustration of a Modern AI-Enabled System

PDF GPT: An Illustration of a Modern AI-Enabled System

In the era of AI, language models like GPT-3/4 are transforming the landscape of software applications. This article analyzes the GitHub repository, PDF GPT, an application that harnesses the power of GPT-3. It demonstrates how a large language model can be integrated into a broader software system.

The technical specifics are examined, covering how every line of code contributes to the overall functionality. We will also discuss the system design patterns that could be applied here.

System Overview (UML Diagram)

The diagram below illustrates the relationship between various components of the system:

    UserInterface -- SemanticSearch : Provides query
    UserInterface -- PDFProcessing : Provides PDF
    UserInterface -- GPTInteraction : Gets response
    PDFProcessing -- SemanticSearch : Provides processed text
    SemanticSearch -- GPTInteraction : Provides relevant chunks
    class UserInterface {
        + Gradio UI
    class PDFProcessing {
        + Download PDF
        + Extract Text
        + Chunk Text
    class SemanticSearch {
        + Compute Embeddings
        + Perform Search
    class GPTInteraction {
        + Generate Prompt
        + Get Completion

Technical Parts with Related Code

User Interface

The user interface is created using Gradio, a Python library for creating simple and customizable UIs for Python functions. It consists of text boxes for the OpenAI API key, PDF URL or file, and the query.

with gr.Blocks() as demo:

    with gr.Row():
        with gr.Group():
            openAI_key=gr.Textbox(label="Enter your OpenAI API key here")
            url = gr.Textbox(label="Enter PDF URL here")
            file = gr.File(label='Upload your PDF/ Research Paper / Book here', file_types=['.pdf'])
            question = gr.Textbox(label='Enter your question here')
            btn = gr.Button(value='Submit')
        with gr.Group():
            answer = gr.Textbox(label='The answer to your question is :')
        btn.click(question_answer, inputs=[url, file, question,openAI_key], outputs=[answer])

PDF Processing

PDF processing involves a few functions. The download_pdf function downloads the PDF from the provided URL. The preprocess function removes newlines and extra whitespace from the extracted text. The pdf_to_text function reads the text from each page of the PDF, preprocesses it, and stores it in a list. The text_to_chunks function divides the text into chunks of a specified word length.

def download_pdf(url, output_path):
    urllib.request.urlretrieve(url, output_path)

def preprocess(text):
    text = text.replace('\n', ' ')
    text = re.sub('\s+', ' ', text)
    return text

def pdf_to_text(path, start_page=1, end_page=None):
    doc = fitz.open(path)
    total_pages = doc.page_count
    #...rest of the function

def text_to_chunks(texts, word_length=150, start_page=1):
    text_toks = [t.split(' ') for t in texts]
    #...rest of the function

Semantic Search

The SemanticSearch class creates embeddings for the text chunks using the Universal Sentence Encoder model from TensorFlow Hub. The fit method computes these embeddings and creates a NearestNeighbors model from scikit-learn, fitted with the embeddings. The __call__ method computes the embedding of the input text and retrieves the nearest neighbors from the model.

class SemanticSearch:
    def __init__(self):
        self.use = hub.load('https://tfhub.dev/google/universal-sentence-encoder/4') 
        self.fitted = False
    #...rest of the class

GPT-3 Interaction

The generate_text function uses the OpenAI API to generate a text completion based on the provided prompt. The generate_answer function forms a prompt from the question and the top-n chunks from the semantic search, calls generate_text with this prompt, and returns the generated text.

def generate_text(openAI_key,prompt, engine="text-davinci-003"):
    openai.api_key = openAI_key
    #...rest of the function

def generate_answer(question,openAI_key):
    topn_chunks = recommender(question)
    #...rest of the function

System Design Pattern

The current implementation leans towards a procedural programming paradigm. However, this could be structured as an MVC (Model-View-Controller) pattern, where the SemanticSearch class and PDFProcessing functions act as the Model, the Gradio UI is the View, and the GPTInteraction module serves as the Controller.

Alternatively, a Microservices Architecture could be considered. Here, each module operates independently and communicates through APIs. This would make the system more scalable and flexible.

Lastly, an Event-Driven Architecture could be used. In this case, user actions trigger a chain of events, resulting in a more responsive and efficient system.

Regardless of the architecture, traditional software engineering principles are essential when working with advanced AI technologies like GPT-3. The choice of design pattern would depend on factors like the specific requirements of the project, the team's expertise, and the anticipated future expansions of the system.

Possibility of Refactoring

The provided Python script can be broken down into several distinct sections. Let's discuss each part.

Utility functions and classes

The utility functions and classes include download_pdf, preprocess, pdf_to_text, and text_to_chunks.

SemanticSearch class

The SemanticSearch class uses Google's Universal Sentence Encoder and a nearest neighbors algorithm to implement a semantic search model.

Load recommender function

The load_recommender function creates a global recommender object by converting a PDF to text, chunking the text, and fitting the SemanticSearch model to these chunks.

Generate text and answer functions

The generate_text function uses the OpenAI API to generate text, while the generate_answer function generates an answer to a question using the OpenAI API.

Question answer function

The question_answer function handles the application logic by processing PDF file into chunks of text, using the SemanticSearch model to find relevant chunks to the user's question, and generating a response using the OpenAI API.

Web application setup

The web application is built using the gradio library and includes a title, description, input fields for the OpenAI API key, PDF URL or file upload, a 'Submit' button, and a text field for the answer.


The code can be refactored into several files or classes for better code organization. For instance, utility functions could be contained in a utils.py file, the SemanticSearch class in a semantic_search.py file, functions handling the main logic of the application in an app_logic.py file, and the main script that sets up and runs the web application in an app.py file.


Also, take into account the potential requirement of multi-threading or asynchronous programming for better performance.

Moreover, as previously discussed, an MVC (Model-View-Controller) design pattern could be adopted. The Model could include the SemanticSearch and PDFProcessing, the View could be the Gradio UI, and the Controller could be the GPTInteraction module. This would provide a clear separation between the logic, user interface, and control flow of the application, making it easier to maintain and enhance.

No comments:

Post a Comment