Web Scraping Program Using Python

Katie Wojciechowski
Sep 7
2 min read

For a recent school project for my Business Intelligence class, I developed a Python scraper to extract detailed information about smartphones from a structured webpage (created by the professor for the purpose of the class). Here’s a quick overview of my approach, the reasoning behind it, and key web scraping concepts I applied.

Why Scrape?

Web scraping allows us to programmatically extract and organize data from websites that don’t offer APIs. This can save time and effort versus manual data collection, and enables further analysis or integration with other tools (like Excel or BI software).

Tools & Libraries Used

requests: To send HTTP requests and retrieve the page content.
BeautifulSoup: To parse HTML content and navigate the DOM easily.
csv: To export the scraped data into a usable format for later analysis.

Step 1: Requesting and Validating the Webpage

I began by sending a GET request to the target URL. I checked the response status code to ensure the request succeeded (HTTP 200) and validated there was no unexpected redirect. This is crucial to confirm that I’m scraping the correct content and not a redirect or error page.

Step 2: Parsing the HTML

Using the package BeautifulSoup, I parsed the HTML content to convert it into a structured object (a "soup"). This allows for easy querying and extraction of page elements via tags, classes, and IDs.

Step 3: Targeting Relevant Data

The main data was organized within a <div> with the ID "phonelist". I located this section to focus on phone entries. Each phone was represented as a list item (li with class "root"), containing basic attributes like color, OS, and a link to more details.

For example, I extracted the iPhone 11 Pro details by:

Navigating the DOM tree to find the specific phone.
Splitting attribute text to isolate values (e.g., splitting on ": " to get the OS).

Step 4: Handling Child Pages for More Data

Some attributes, such as storage and camera features, were on child pages linked from the main phone list. I followed these links by sending new GET requests, parsed the child pages, and extracted the additional details.

This multi-level scraping demonstrates handling more complex data structures beyond a single page.

Step 5: Organizing and Exporting Data

Finally, I automated the process to scrape all phones and their details, writing them to a CSV file. This included:

Extracting each phone’s attributes.
Following child page links for storage, size, and other specs.
Writing all the collected data into a structured CSV file with headers.

Key Concepts Highlighted

HTTP Requests & Response Handling: Understanding status codes and redirects ensures robust scraping.
HTML Parsing: Using BeautifulSoup to target elements by tags, attributes, and classes.
Text Processing: Manipulating strings to extract clean data.
Recursive Scraping: Navigating to child pages for comprehensive data collection.
Data Export: Structuring CSV output for usability in tools like Excel or BI software.

This project gave me practical experience in web scraping workflows, from fetching and parsing HTML to managing multi-page data and exporting results for further analysis. It’s a powerful example of how Python can automate and streamline data collection from the web.