AI-powered Brochure Generator#
🌍 Task: Generate a company brochure using its name and website for clients, investors, and recruits.
🧠 Model: Toggle
USE_OPENAIto switch between OpenAI and Ollama models🕵️♂️ Data Extraction: Scraping website content and filtering key links (About, Products, Careers, Contact).
📌 Output Format: a Markdown-formatted brochure streamed in real-time.
🚀 Tools: BeautifulSoup, OpenAI API, and IPython display, ollama.
🧑💻 Skill Level: Intermediate.
🛠️ Requirements
⚙️ Hardware: ✅ CPU is sufficient — no GPU required
🔑 OpenAI API Key
Install Ollama and pull llama3.2:3b or another lightweight model
🧩 System Design Overview#
Class Structure#
This code consists of three main classes:
Website:Scrapes and processes webpage content.
Extracts text and links from a given URL.
LLMClient:Handles interactions with OpenAI or Ollama (
llama3,deepseek,qwen).Uses
get_relevant_links()to filter webpage links.Uses
generate_brochure()to create and stream a Markdown-formatted brochure.
BrochureGenerator:Uses
Websiteto scrape the main webpage and relevant links.Uses
LLMClientto filter relevant links and generate a brochure.Calls
generate()to run the entire process.
Workflow#
main()initializesBrochureGeneratorand callsgenerate().generate()callsLLMClient.get_relevant_links()to extract relevant links using LLM (OpenAI/Ollama).Websitescrapes the webpage, extracting text and links from the given URL.Relevant links are re-scraped using
Websiteto collect additional content.All collected content is passed to
LLMClient.generate_brochure().LLMClientstreams the generated brochure using OpenAI or Ollama.The final brochure is displayed in Markdown format.
Intermediate reasoning#
In this workflow, we have intermediate reasoning because the LLM is called twice:
First LLM call: Takes raw links → filters/selects relevant ones (reasoning step).
Second LLM call: Takes selected content → generates final brochure.
🧠 LLM output becomes LLM input — that’s intermediate reasoning.
📦 Import Libraries#
import os
import requests
import json
import re
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import display, Markdown, update_display
from urllib.parse import urljoin
from concurrent.futures import ThreadPoolExecutor
from openai import OpenAI
🧠 Define the Model#
The user can switch between OpenAI and Ollama by changing a single variable (USE_OPENAI). The model selection is dynamic.
# Load API key
load_dotenv()
api_key = os.getenv('OPENAI_API_KEY')
if not api_key or not api_key.startswith('sk-'):
raise ValueError("Invalid OpenAI API key. Check your .env file.")
USE_OPENAI = True
MODEL = 'gpt-4o-mini' if USE_OPENAI else 'llama3.2:3b'
openai_client = OpenAI() if USE_OPENAI else None
🏗️ Define Classes#
HEADERS = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
"(KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}
class Website:
def __init__(self, url):
self.url = url
self.title = "No title found"
self.text = ""
self.links = []
try:
response = requests.get(url, headers=HEADERS, timeout=10)
response.raise_for_status()
soup = BeautifulSoup(response.content, 'html.parser')
self.title = soup.title.string if soup.title else self.title
self.text = self.extract_text(soup)
self.links = self.extract_links(soup)
except requests.RequestException as e:
print(f"Failed to fetch {url}: {e}")
def extract_text(self, soup):
if soup.body:
for tag in soup.body(["script", "style", "img", "input"]):
tag.decompose()
return soup.body.get_text(separator="\n", strip=True)
return ""
def extract_links(self, soup):
raw_links = [a.get('href') for a in soup.find_all('a') if a.get('href')]
full_links = [urljoin(self.url, link) for link in raw_links]
return [link for link in full_links if link.startswith('http')]
def get_contents(self):
return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"
class LLMClient:
def __init__(self, model=MODEL):
self.model = model
def call_llm(self, system_prompt, user_prompt, stream=False):
if USE_OPENAI:
return openai_client.chat.completions.create(
model=self.model,
messages=[{"role":"system","content":system_prompt},
{"role":"user","content":user_prompt}],
stream=stream
)
else:
import ollama
return ollama.chat(
model=self.model,
messages=[{"role":"system","content":system_prompt},
{"role":"user","content":user_prompt}],
stream=stream
)
def get_relevant_links(self, website):
system_prompt = """
You are given a list of links from a company website.
Select only relevant links for a brochure (About, Company, Careers, Products, Contact).
Exclude login, terms, privacy, and emails.
Return only valid JSON, no explanations.
Example: {"links":[{"type":"about","url":"https://company.com/about"}]}
"""
user_prompt = f"Website URL: {website.url}\nLinks: {', '.join(website.links)}"
try:
response = self.call_llm(system_prompt, user_prompt)
if USE_OPENAI:
return json.loads(response.choices[0].message.content.strip())
else:
result = response.get("message", {}).get("content", "").strip()
return json.loads(result)
except (json.JSONDecodeError, KeyError):
print(f"Warning: Failed to parse JSON for {website.url}")
return {"links": []}
def generate_brochure(self, company_name, content, language='English'):
system_prompt = """
You are a professional translator and writer who creates fun and engaging brochures.
Generate a short, humorous, entertaining brochure in Markdown.
Include company culture, customers, careers/jobs if available.
"""
# Smart truncation by lines
content_lines = content.split("\n")
truncated_content = ""
for line in content_lines:
if len(truncated_content) + len(line) > 5000:
break
truncated_content += line + "\n"
user_prompt = f"Create a brochure for '{company_name}' using the following content:\n{truncated_content}\nRespond in {language}."
if USE_OPENAI:
response_stream = self.call_llm(system_prompt, user_prompt, stream=True)
response = ""
display_handle = display(Markdown(""), display_id=True)
for chunk in response_stream:
delta = chunk.choices[0].delta.content or ''
delta = re.sub(r"```.*?```", "", delta, flags=re.DOTALL)
response += delta
update_display(Markdown(response), display_id=display_handle.display_id)
else:
response_stream = self.call_llm(system_prompt, user_prompt, stream=True)
display_handle = display(Markdown(""), display_id=True)
full_text = ""
for chunk in response_stream:
if "message" in chunk:
content = chunk["message"]["content"] or ""
content = re.sub(r"```.*?```", "", content, flags=re.DOTALL)
full_text += content
update_display(Markdown(full_text), display_id=display_handle.display_id)
from concurrent.futures import ThreadPoolExecutor
from IPython.display import display, Markdown, update_display
class BrochureGenerator:
"""
Generates a structured, visually appealing brochure from a company's website.
"""
def __init__(self, company_name, url, language='English'):
self.company_name = company_name
self.url = url
self.language = language
self.website = Website(url)
self.llm_client = LLMClient()
self.website_cache = {}
def fetch_link_content(self, url):
if url in self.website_cache:
return self.website_cache[url]
w = Website(url)
self.website_cache[url] = w.get_contents()
return self.website_cache[url]
def generate(self):
links = self.llm_client.get_relevant_links(self.website)
content_blocks = []
homepage_text = self.website.get_contents()
content_blocks.append({
"title": "Homepage",
"content": homepage_text
})
link_urls = [link['url'] for link in links['links']]
with ThreadPoolExecutor(max_workers=5) as executor:
results = list(executor.map(self.fetch_link_content, link_urls))
for link_obj, text in zip(links['links'], results):
block_title = link_obj.get('type', 'Section').title()
content_blocks.append({
"title": block_title,
"content": text
})
final_content = ""
for block in content_blocks:
final_content += f"## {block['title']}\n\n"
paragraphs = [p.strip() for p in block['content'].split("\n") if p.strip()]
for p in paragraphs:
if len(p) < 150:
final_content += f"- {p}\n"
else:
final_content += f"> {p}\n"
final_content += "\n"
self.llm_client.generate_brochure(self.company_name, final_content, self.language)
📝 Generate Brochure#
def main():
company_name = "Tour Eiffel"
url = "https://www.toureiffel.paris/fr"
language = "English"
generator = BrochureGenerator(company_name, url, language)
generator.generate()
if __name__ == "__main__":
main()
Welcome to the Fun and Fabulous Tour Eiffel! 🌟
Webpage Title:
La tour Eiffel, Site OFFICIEL: where dreams become panoramic views!
Explore the Majestic Tower of Paris! 🗼
Wanna Climb to New Heights?
Get ready for an ascending adventure! From the ground to the sky, our tower offers an emotional journey that’s sure to lift your spirits—and possibly your lunch if you’ve eaten too much!
What’s on the Agenda?
Tarifs & horaires: Know before you go!
Restaurants & boutiques: Because who doesn’t want to enjoy a croissant while contemplating life? 🥐
The View: 1st floor? More like 1st fabulous! Fluffy views ahead!
The Summit: Where the squirrels have better views than you! 🐿️
Tagline: Turn your “I can’t” into “I can… see the world!” 🚀
For the Young Explorers! 👨👩👦👦
Don’t Forget: Grab our ludo-éducatifs (fun educational booklets) before you zoom up in the elevator! Perfect for little minds eager to learn about their vertical journey—what’s the speed of an elevator? Who knows, but we’re climbing!
Feel the Eiffel Effect! 🎢
There’s a New Movie in Town!
“Chaque visite est unique!” 🎥 Get those tissues ready—this emotional rollercoaster will have you laughing and crying as you spiral up the tower. Just don’t drop your popcorn!
Dining with a View! 🍽️
Dine at Madame Brasserie!
Eat gourmet food while gazing at the Heavens of Paris—no reservations needed, just a hopeful heart and a tiny bit of luck!
Didn’t I See You on Social Media?
Join us where the Eiffel vibes are alive—Facebook, Instagram, Twitter… we’re all over the web like breadcrumbs on a French baguette! 🥖📸
Job Opportunities! 🧑💼
Are You Eiffel-tastic?
We’re always looking for folks who can turn visitors into fans! Positions range from tour guides who can’t believe they’re giving tours every day, to workshop coordinators who can juggle flaming torches while reciting Eiffel trivia!
Contact Us
Need help? Our FAQ will answer all the burning questions you have—like what snacks to bring and whether climbing the stairs counts as cardio.
Want to work with us? Visit our Careers page!
Are you ready to rise above it all?
Visit us or buy tickets online to skip the lines! Because remember, waiting isn’t as fun as sliding up an elevator!
Join the countless travelers who have made Eiffel history… one Instagram post at a time! 📷✨
You’ve Dreamed, Now Live It at La Tour Eiffel!
The only place you can reach your peak goals while snacking on a French pastry!
Website: toureiffel.paris
Follow us across the internet! 🌐
This brochure is best viewed while standing under the actual Tour Eiffel.