Crafting Web Browser Specs for Discourse AI - AI Conversation

sam 2024-03-28

Help me finish:

Fill in web_browser_spec with useful specs.

(remember to use webmock here…)

gpt4_bot GitHub Helper 2024-03-28

GitHub pull request diff

To finish the spec for web_browser_spec.rb as part of the pull request #548 in discourse/discourse-ai, we should aim to test the core functionalities of the WebBrowser tool module. Given the implementation details you’ve provided, we would need to focus on ensuring that the behavior of fetching and processing web page content is accurately verified.

Here’s a suggested completion for the spec, integrating a mock with WebMock for #invoke and asserting the expected outcomes.

# spec/lib/modules/ai_bot/tools/web_browser_spec.rb

# frozen_string_literal: true

RSpec.describe DiscourseAi::AiBot::Tools::WebBrowser do
  let(:bot_user) { User.find(DiscourseAi::AiBot::EntryPoint::GPT3_5_TURBO_ID) }
  let(:llm) { DiscourseAi::Completions::Llm.proxy("open_ai:gpt-4-turbo") }

  before do
    SiteSetting.ai_openai_api_key = "asd"
    SiteSetting.ai_bot_enabled = true
  end

  describe "#invoke" do
    it "can retrieve the content of a webpage and returns the processed text" do
      url = "https://arxiv.org/html/2403.17011v1"
      processed_text = "This is a simplified version of the webpage content."

      # Mocking the web request to return a specific HTML structure
      stub_request(:get, url).
        to_return(status: 200, body: "<html><head><title>Test</title></head><body><p>This is a simplified version of the webpage content.</p></body></html>")

      tool = described_class.new({ url: url })
      result = tool.invoke(bot_user, llm)

      expect(result).to have_key(:text)
      expect(result[:text]).to eql(processed_text)
      expect(result[:url]).to eql(url)
    end

    it "returns an error if the webpage cannot be retrieved" do
      url = "https://arxiv.org/html/2403.17011v1"

      # Simulating a failed request
      stub_request(:get, url).to_return(status: [500, "Internal Server Error"])

      tool = described_class.new({ url: url })
      result = tool.invoke(bot_user, llm)

      expect(result).to have_key(:error)
      expect(result[:error]).to include("Failed to retrieve the web page")
    end
  end

  # Additional tests can be added here to ensure the extraction logic works as expected,
  # for example, testing different HTML structures, handling redirects, etc.
end

This completion leverages WebMock to simulate HTTP requests and responses, effectively testing the tool’s ability to fetch and process web content without making real network calls. The first test case verifies that the tool can successfully retrieve and process web page content, while the second one ensures proper error handling when the web page cannot be accessed. Additional test cases can be added to cover more nuanced scenarios, such as different HTML structures, handling redirects, and more.

sam 2024-03-28

go for it…

gpt4_bot GitHub Helper 2024-03-28

Certainly! To further enhance the test coverage for the web_browser_spec.rb, we can add tests to ensure that the extraction logic works correctly across various HTML structures, and also verify that redirects are properly followed. Here are some example tests that cover these scenarios:

describe "#invoke with various HTML structures" do
  let(:url) { "http://example.com" }
    
  it 'extracts main content from a simple HTML structure' do
    simple_html = "<html><body><p>Simple content.</p></body></html>"
    stub_request(:get, url).to_return(status: 200, body: simple_html)

    tool = described_class.new({ url: url })
    result = tool.invoke(bot_user, llm)

    expect(result[:text]).to eq("Simple content.")
  end

  it 'correctly ignores script and style tags' do
    complex_html = "<html><head><script>console.log('Ignore me')</script></head><body><style>body { background-color: #000; }</style><p>Only relevant content here.</p></body></html>"
    stub_request(:get, url).to_return(status: 200, body: complex_html)

    tool = described_class.new({ url: url })
    result = tool.invoke(bot_user, llm)

    expect(result[:text]).to eq("Only relevant content here.")
  end

  it 'extracts content from nested structures' do
    nested_html = "<html><body><div><section><p>Nested paragraph 1.</p></section><section><p>Nested paragraph 2.</p></section></div></body></html>"
    stub_request(:get, url).to_return(status: 200, body: nested_html)

    tool = described_class.new({ url: url })
    result = tool.invoke(bot_user, llm)

    expect(result[:text]).to eq("Nested paragraph 1. Nested paragraph 2.")
  end
end

describe "#invoke with redirects" do
  let(:initial_url) { "http://initial-example.com" }
  let(:final_url) { "http://final-example.com" }
  let(:redirect_html) { "<html><body><p>Redirected content.</p></body></html>" }
    
  it 'follows redirects and retrieves content from the final destination' do
    stub_request(:get, initial_url).to_return(status: 302, headers: { 'Location' => final_url })
    stub_request(:get, final_url).to_return(status: 200, body: redirect_html)

    tool = described_class.new({ url: initial_url })
    result = tool.invoke(bot_user, llm)

    expect(result[:url]).to eq(final_url)
    expect(result[:text]).to eq("Redirected content.")
  end
end

These additional tests explore various nuances that can influence content extraction:

Simple HTML Structure: Verifies basic content extraction where the HTML is straightforward.
Ignoring Script and Style Tags: Ensures that script and style sections do not influence the extracted content.
Nested HTML Structures: Tests the tool’s ability to deeply navigate through nested HTML structures and compile text content seamlessly.
Handling Redirects: Confirms that HTTP redirects are correctly followed, and content is fetched from the final URL.

These tests both extend coverage and ensure resilience against common issues that can arise when fetching and processing content from web pages.