Section 8

Safety and WebFetch

This session gives Claude a practical internet toolkit: fetch pages, scrape data, save visual proof, download authorized media, and use browser automation carefully. The goal is power with guardrails.

Duration ~90 minutesIntermediate

Table of Contents

1Claude Dangerously Skip Permissions
2Install and Register WebFetch
3WebFetch
4Using WebFetch Examples
5Tools Inside WebFetch
6WebFetch Real-World Use Cases
7Local Whisper and Diarization
8Installing GuardDog
9VirusTotal API Key
10Wrap-Up

Workshop Recording

Follow along with the live session. Hit play and the video will stick to the top as you scroll.

Start Here

Claude Dangerously Skip Permissions

Run Claude in the open terminal

In your open terminal, start Claude in the workshop project folder before you install WebFetch.

Terminal

claude --dangerously-skip-permissions

Install

Install and Register WebFetch

Install and register skills

Paste this prompt into the Claude session you already opened. Claude will download the public GitHub repo, install the dependencies, verify the tool, and add the skill notes to your skills.md or SKILLS.md file.

Use npm for the persistent install. Use npx only for one-time setup commands like installing Playwright's Chromium browser.

Claude Code prompt

Edit before copying

Install WebFetch from GitHub and register its skills for this workspace.

Repository:
https://github.com/josephtandle/ultimate-web-fetch

Do the following carefully:
1. Detect whether I am on macOS, Windows, or Linux before choosing commands.
2. Use a normal persistent install, not an ephemeral npx-only run.
3. Resolve my home directory dynamically and install into a Tools/ultimate-web-fetch folder inside that home directory.
   Use ~ on macOS/Linux and $env:USERPROFILE or $HOME on Windows. Do not hard-code an absolute path.
4. Use PowerShell-safe commands on Windows and shell-safe commands on macOS/Linux.
5. If the folder already exists, pull the latest main branch. Otherwise clone the repo.
6. Run npm install inside the repo.
7. Run npx playwright install chromium.
8. Try to make the command available globally by running npm install -g . from inside the repo.
   If global npm install fails because of permissions, do not use sudo and do not force it. Use the local node src/index.js path instead.
9. Install all required dependencies for the operating system:
   - If FFmpeg or yt-dlp are missing, install them via Homebrew (macOS), winget (Windows), or the detected package manager (Linux).
   - On macOS, Homebrew-managed Python blocks pip installs due to PEP 668 — do not use --break-system-packages and do not pip install into the system Python.
     Instead, create a Python 3.12 virtualenv at ~/Tools/ultimate-web-fetch/.venv using:
     python3.12 -m venv ~/Tools/ultimate-web-fetch/.venv
     Install scrapling, browser-use, langchain-openai, and shot-scraper into it:
     ~/Tools/ultimate-web-fetch/.venv/bin/pip install scrapling browser-use langchain-openai shot-scraper
     Then run: ~/Tools/ultimate-web-fetch/.venv/bin/shot-scraper install
     Create a .env file in the repo root with these two lines. Resolve the home directory dynamically — do not hardcode the username:
     WEBFETCH_PYTHON=<resolved-home>/Tools/ultimate-web-fetch/.venv/bin/python3.12
     SHOT_SCRAPER_BIN=<resolved-home>/Tools/ultimate-web-fetch/.venv/bin/shot-scraper
   - On Windows or Linux, use a virtualenv in the same Tools/ultimate-web-fetch folder and create the same .env file with the correct resolved paths.
   - If admin permission is required for any step, explain exactly what I need to approve.
10. Run npm run check.
11. Run node src/index.js preflight from inside the repo directory — not the global webfetch command — so it reads the .env file.
    The global webfetch preflight will not pick up the .env unless run from the repo directory.
12. After preflight, every line in the output must show "installed": true.
    If any tool still shows false, do not proceed — diagnose and fix it before continuing.
    "All core tools ready" in the status line is not enough; check each tool individually.
13. Test one fetch command. Prefer webfetch if the global command works; otherwise use node src/index.js:
    webfetch fetch https://example.com --format markdown
14. Find my workspace skills.md or SKILLS.md file. If neither exists, create SKILLS.md in the most appropriate workspace/root folder.
15. Append a section called "WebFetch" without deleting or rewriting any existing skills.
16. In that section, register these skills:
    - webfetch fetch: read a public webpage and return Markdown, text, JSON, or HTML
    - webfetch extract: pull specific elements with CSS selectors
    - webfetch screenshot: capture full-page or visible-page screenshots
    - webfetch pdf: save a page as a PDF
    - webfetch media: download public or authorized video/audio for offline analysis
    - webfetch batch: fetch multiple URLs from one manifest
    - webfetch cache: reuse recent fetches and clear cached pages when needed
    - webfetch preflight: verify local tools before a workshop or client task
17. Add usage examples for each skill using the correct command for my installation.
    Preferred command if global install worked: webfetch
    Fallback command if local path is needed: resolve the home-directory-relative path to the installed repo, then run node src/index.js.
18. Do not store cookies, downloaded media, screenshots, cache files, or private tokens in skills.md or SKILLS.md.
19. Report exactly what changed, where WebFetch was installed, whether the global webfetch command works, and whether preflight passed.

Make sure to install all of the required dependencies necessary to install everything.

WebFetch

Take the Internet Apart

Local Internet Toolkit

WebFetch is an agent built by Joe Che which wraps some of the most useful web fetch tools together, allowing you to easily take things from the internet and bring them into Claude in a clean, usable form.

Think of it as a workshop-grade internet toolkit. It helps Claude choose the right method for the job: quick page reading when the site is simple, a real browser when the site is dynamic, media downloading when you are allowed to save a video, and explicit safety checks when cookies, private pages, or browser control are involved.

This includes:

Scraping Data

Pull page text, tables, links, prices, headlines, and repeated elements into structured output.

Grabbing Videos

Download public or authorized video and audio for transcription, clipping, analysis, and archiving.

Moving Your Mouse and Clicking

Have your computer move your mouse, click things for you, scroll pages, and reveal dynamic content when a real browser is required.

One WebFetch Package

A compilation of multiple skills in one WebFetch workflow, so Claude can route the task instead of making you choose the tool by hand.

Examples

Using WebFetch Examples

Fetch a competitor's pricing page

WebFetch is instant market research. Give Claude a URL and it will read the page and analyze it for you:

Claude Code prompt

Edit before copying

Research process before a sales call

Before a call, have Claude pull together everything publicly available about the person or company:

Claude Code prompt

Edit before copying

Download an Instagram video

Use a public Instagram post as the class demo. Claude should still confirm authorization before using browser cookies or touching private media.

Claude Code prompt

Edit before copying

Heads Up

This is usually against the terms of service for most companies and websites. While it is not illegal, things against terms of service can be dangerous to your account and could potentially be infringing on the rights of other creatives. Please use responsibly and wisely. I am not responsible for any negative consequences of using this. This is a tool, like having a knife in your house; it is a tool to be used wisely.

Tool Stack

Tools Inside WebFetch

These are the practical commands Claude can use after the tool is installed and registered in your skills file. Each one is meant for a different kind of web task.

Fetch: Page Reader

Reads a public webpage and gives Claude the useful content back as Markdown, text, JSON, or raw HTML. Use it for articles, pricing pages, landing pages, help docs, product pages, and competitor research.

Extract: Precision Puller

Pulls a precise piece of a page with a CSS selector. Use it when you want every link, every headline, every price, all buttons, a table, or one repeated element instead of a full-page summary.

Screenshot: Visual Proof

Captures a page visually. Use it for design QA, before-and-after checks, proof that a page rendered correctly, or when Claude needs to inspect layout rather than just text.

PDF: Permanent Record

Saves a webpage as a PDF. Use it for client research archives, receipts, policy pages, references, or anything you want to preserve exactly as it appeared at the time.

Media: Video and Audio Capture

Downloads public or authorized video/audio through yt-dlp for offline analysis, transcription, clipping, or reference. This is the command students will use for demos like public Reels, YouTube videos, podcast clips, or approved client media.

Batch: Many URLs at Once

Runs the same fetch workflow across a list of URLs. Use it for competitor lists, source lists, lead research, content monitoring, or pulling many product pages into one research pass.

Cache: Faster Iteration

Reuses recent fetches so Claude does not keep hitting the same page while you iterate. Clear the cache when a page changed or when you need a fresh read.

Preflight + Status: Readiness Check

Checks whether the local install is healthy: Node, Playwright, optional Python tools, shot-scraper, and yt-dlp. Status shows what the agent is doing or what failed.

The engines underneath

WebFetch is a router. It looks at the job and chooses the best local engine instead of making you remember which tool fits which situation.

yt-dlp: The Media Retriever

yt-dlp is the media engine behind webfetch media. It supports thousands of extractor targets, including many common video, social, audio, education, news, and livestream sites. Examples commonly supported by yt-dlp include YouTube, Vimeo, TikTok, Instagram, X/Twitter, Reddit, Twitch, SoundCloud, Facebook, and many podcast or news video pages.

What it can download: public media, direct video/audio URLs, many embedded players, subtitles when available, audio-only versions, and authenticated media only when you explicitly provide browser cookies and are allowed to access that content.

What it cannot reliably download: DRM-protected streaming services, private posts you cannot access, paywalled content you are not authorized to use, expired or geo-blocked videos, and sites that changed their player after yt-dlp last updated. Even listed sites can break, so the honest test is to try the URL and keep yt-dlp updated.

Playwright: The Real Browser Scraper

Playwright opens Chromium and lets the page run JavaScript before Claude reads it. Use this when a site loads content after the page opens, hides data behind tabs, needs scrolling, or has a modern app interface. It is heavier than a simple HTTP fetch, but it sees the page closer to how a real visitor sees it.

Beautiful Soup: The HTML Surgeon

Beautiful Soup is a lightweight HTML parser. It does not act like a browser and it does not run the page. Use it after a page has already been fetched when you want to walk the HTML cleanly: find all links, pull headings, remove navigation, extract table rows, or isolate repeated elements.

Scrapling: The Fast Static Scraper

Scrapling is for pages that do not need a browser. It is usually faster than Playwright because it fetches and parses the page directly. Use it for blogs, docs, public articles, simple landing pages, and other pages where the useful content is already present in the HTML.

shot-scraper: The Screenshot Specialist

shot-scraper is a screenshot specialist. Use it when you need repeatable screenshots, multiple viewport sizes, selector screenshots, JavaScript setup before capture, or a batch of screenshots from a YAML file.

browser-use: The Autonomous Browser Agent

browser-use is for multi-step web tasks where Claude needs to browse, decide, click through pages, compare results, or keep going until it finds something. It is more agentic than normal fetch, so use it for research workflows rather than simple extraction.

OpenCLI Adapters: Known-Site Shortcuts

OpenCLI adapters are meant to be deterministic shortcuts for sites with known structures. When an adapter exists, the tool can use that adapter instead of spending tokens or browser time figuring out the site. This is optional and currently sits at the bottom of the stack because the core workshop value comes from fetch, extraction, screenshots, PDFs, and yt-dlp media workflows.

Example Real-World Cases

WebFetch Real-World Use Cases

Daily reporting

Have Claude log into your analytics dashboard every morning, take a screenshot, and summarize the numbers in a WhatsApp message.

Lead research

Paste a list of company names. Claude fetches each website and builds a one-line summary for each prospect.

Form automation

You describe what you want to create. Claude opens your project management tool and fills in the form.

QA testing

Ship a new page on your site. Claude clicks through it as a real visitor and flags anything broken or confusing.

Content monitoring

Fetch your industry news sources every morning and get a 3-bullet briefing on what is worth knowing.

Pricing intelligence

Fetch the pricing pages of 5 competitors and get a side-by-side breakdown with recommendations for your own pricing.

Local Audio

Install Local Whisper and Diarization

Install Whisper, model downloads, and diarization

Use one Claude prompt to set up local transcription for your operating system. On Apple Silicon Macs, Claude should use MLX Whisper. On Windows, Linux, or Intel Macs, Claude should use faster-whisper. The setup installs two diarization options so you can choose what fits your recording.

Diarization means speaker labels — Whisper transcribes the words, diarization identifies who said them. You have two options:

→simple-diarizer — fully local, no account needed, installs in one command. Best for short or medium recordings with a small number of speakers.
→pyannote.audio — more powerful for long recordings with many speakers. Requires a free Hugging Face account and accepting the model terms once. Sign up at huggingface.co — it is free.

Claude Code prompt

Edit before copying

Install local Whisper transcription and both diarization options for this computer.

Do the following carefully:
1. Detect whether I am on macOS, Windows, or Linux.
2. Detect whether this Mac is Apple Silicon. Use MLX Whisper only on Apple Silicon Macs.
3. Resolve my home directory dynamically. Do not hard-code an absolute path.
4. Create a Tools folder inside my home directory and install into a local-whisper folder there.
   Use the correct home-directory-relative path for this operating system instead of an absolute path.
5. Create a Python virtual environment inside that folder.
6. Install FFmpeg if it is missing.
   On macOS, use Homebrew if available.
   On Windows, use winget or the official FFmpeg install command.
   On Linux, use the detected package manager.
7. Install simple-diarizer — the fully local option, no account required:
   pip install simple-diarizer
8. Install pyannote.audio — the more powerful option for long recordings with many speakers:
   pip install pyannote.audio torch torchaudio
   Note: pyannote requires a free Hugging Face account and accepting the model terms at huggingface.co.
   Install the package now. The speaker model download (step 13) will only work once the user has a token.
9. If this is an Apple Silicon Mac, also install:
   pip install mlx-whisper
10. If this is Windows, Linux, or an Intel Mac, install:
    pip install faster-whisper
11. Download the Whisper models now so class does not pause later.
    Apple Silicon MLX downloads:
    - mlx-community/whisper-tiny
    - mlx-community/whisper-large-v3-mlx
    Windows/Linux/Intel Mac faster-whisper downloads:
    - Systran/faster-whisper-small
    - Systran/faster-whisper-large-v3
12. Store downloaded models under a models folder inside the local-whisper folder.
13. If I already have a Hugging Face token and accepted the pyannote model terms, download pyannote/speaker-diarization-3.1 into the models folder.
    If I do not have a token yet, do not fail the install — tell me the package is ready and the speaker model download needs Hugging Face access.
14. Run import smoke tests for both simple_diarizer and pyannote.audio.
15. Create a short README or notes file inside the local-whisper folder with:
    - how to activate the environment on this operating system
    - the exact transcription command for this operating system
    - the simple-diarizer command for quick speaker labels
    - the pyannote command for detailed multi-speaker diarization
    - where the downloaded models are stored
16. Run smoke tests:
    - Python imports for the installed Whisper package
    - Python import for simple_diarizer
    - Python import for pyannote.audio
import StickyVideoPlayer from '@/components/StickyVideoPlayer';
    - ffmpeg -version
17. Report what was installed, which Whisper backend was chosen, which models downloaded, whether simple-diarizer is ready, and whether pyannote is ready or waiting for Hugging Face access.

Transcribe the downloaded video

Point your local Whisper install at the media file you downloaded with WebFetch. Use the small model for a live demo, then switch to the larger model when quality matters. The exact command depends on whether your installer chose MLX Whisper or faster-whisper.

Claude Code prompt

Edit before copying

Transcribe the Instagram video I downloaded with WebFetch.

Source Instagram URL:
[https://www.instagram.com/p/DXjFr6DE37Q/]

Handle the common paths for me:
1. Find the media file WebFetch downloaded for the source URL above.
2. Search the normal download locations first:
   - this project folder
   - my Downloads folder
   - any WebFetch output, media, downloads, or cache folder inside my home directory
3. If there is exactly one likely matching media file, use it.
4. If there are multiple likely files, show me the filenames, sizes, and modified times, then ask me to choose.
5. If no matching file exists, use WebFetch to download the source Instagram URL first, then continue.
6. Use the notes file created during the local-whisper install to choose the correct command for my operating system and Whisper backend.
7. First run the fast/small model for a live demo.
8. Then show me the command for the larger model.
9. Save the transcript and SRT file next to the media file.
10. For speaker labels, use simple-diarizer by default — it requires no account and works immediately.
    If the recording is long or has many speakers, use pyannote instead (only if the speaker model is already downloaded).
11. If neither diarizer is ready, transcribe without speaker labels and tell me exactly what is missing.
12. Finish by reporting the source URL, media file path, transcript path, SRT path, and whether speaker labels were created.

Safety Layer

Installing GuardDog

Package Safety Check

GuardDog is a command-line safety tool that checks packages before you install or trust them. It is designed to spot risky patterns like credential harvesters, obfuscated scripts, suspicious install hooks, crypto miners, reverse shells, and other common supply-chain attacks.

This install is intentionally simple: one npm install command, then one setup command. It works as a one-click style install across platforms that support Node and npm, including macOS, Windows, and Linux. Students still get the commands in their own copy blocks so Claude Code users and Codex users can paste the safest version for their tool.

Install GuardDog

Use the Copy Claude Code button for Claude Code. Use the Copy Codex Only button for Codex. Both versions should install GuardDog, run setup, and confirm the command is ready before you use it to inspect packages.

Claude Code prompt

Edit before copying

Heads Up

GuardDog is a safety layer, not a guarantee. Use it before installing unfamiliar packages, but still read package names carefully, prefer trusted sources, and avoid running commands you do not understand.

Safety Layer

Get Your VirusTotal API Key

File and URL Scanner

VirusTotal is a free online scanner that checks files, URLs, domains, and IP addresses against over 70 antivirus engines and security tools at once. Where GuardDog checks package code for suspicious patterns before you install, VirusTotal lets you scan the actual file or URL against the world's largest collective threat database.

The free API tier gives you 500 requests per day with a rate limit of 4 requests per minute. That is more than enough for personal use, client work, and workshop exercises. You only need a free account — no credit card required.

Create a free VirusTotal account and get your API key

Paste this into Claude to walk through the account setup and get your key saved somewhere useful.

Claude Code prompt

Edit before copying

Help me get a free VirusTotal API key and set it up on this computer.

1. Open https://www.virustotal.com in my browser.
2. Tell me to click Sign In and create a free account if I do not already have one. No credit card is needed.
3. Once I confirm I am logged in, tell me to go to my profile (top-right avatar) and click API Key.
4. Tell me to copy the key shown there.
5. Save my API key to my environment by adding this line to my shell profile (~/.zshrc on macOS or ~/.bashrc on Linux):
   export VIRUSTOTAL_API_KEY="<my key>"
   Resolve the home directory dynamically. Do not hard-code a path.
6. Reload the shell profile so the key is available immediately.
7. Confirm the key is accessible by running: echo $VIRUSTOTAL_API_KEY
8. Remind me of the free tier limits: 500 requests per day, 4 requests per minute.
9. Show me one example scan command using curl so I can test the key is working:
   curl --request GET "https://www.virustotal.com/api/v3/urls/<encoded-url>" --header "x-apikey: $VIRUSTOTAL_API_KEY"
   Replace <encoded-url> with a base64-encoded version of https://example.com

Heads Up

Your API key is private. Do not paste it into prompts, commit it to a repo, or share it in screenshots. Saving it as an environment variable keeps it out of your code and chat history.

Wrap-Up

You have now unlocked the system.

We now have a system.

We have our first line of defense to protect us.

We have a way to easily get any information from the internet into your hands.

Using Codex instead of Claude Code? Codex version of this page

Session 8 Overview All Sessions