Create An Advance Web Scrapper Using Client Side Javascript

create a client-side web scraper using JavaScript running in the browser. We’ll use the fetch API to make HTTP requests and DOMParser to parse HTML content. Here’s how you can do it:

html

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Web Scraper</title>
</head>
<body>
    <h1>Scraped Articles</h1>
    <ul id="article-list"></ul>

    <script>
        async function fetchPage(url) {
            try {
                const response = await fetch(url);
                if (!response.ok) {
                    throw new Error('Failed to fetch page');
                }
                const html = await response.text();
                return html;
            } catch (error) {
                console.error('Error fetching page:', error);
                return null;
            }
        }

        function parsePage(html) {
            const parser = new DOMParser();
            const doc = parser.parseFromString(html, 'text/html');
            const titles = Array.from(doc.querySelectorAll('h2')).map(title => title.textContent);
            return titles;
        }

        async function main() {
            const url = 'https://example.com/articles';
            const html = await fetchPage(url);

            if (html) {
                const articles = parsePage(html);
                const articleList = document.getElementById('article-list');
                articles.forEach(title => {
                    const listItem = document.createElement('li');
                    listItem.textContent = title;
                    articleList.appendChild(listItem);
                });
            }
        }

        main();
    </script>
</body>
</html>

In this example:

We define functions fetchPage, parsePage, and main.
fetchPage uses the fetch API to make an HTTP request to the specified URL and returns the HTML content of the page.
parsePage uses DOMParser to parse the HTML content and extract the titles of articles.
main is the main function that fetches the page, parses it, and then displays the titles of articles on the webpage.

You can replace 'https://example.com/articles' with the URL of the website you want to scrape.

Remember, running client-side web scraping code has limitations due to browser security policies, such as the same-origin policy and CORS restrictions. You may encounter issues accessing certain websites due to these restrictions. Always respect the terms of service of the websites you’re scraping and ensure your scraping activities comply with legal and ethical guidelines.

Tony's CodeForge Blog

Create An Advance Web Scrapper Using Client Side Javascript

Comments

Leave a Reply Cancel reply