{"id":1275,"date":"2024-07-16T08:51:40","date_gmt":"2024-07-16T08:51:40","guid":{"rendered":"https:\/\/www.ipway.com\/blog\/?p=1275"},"modified":"2024-07-16T08:51:40","modified_gmt":"2024-07-16T08:51:40","slug":"web-scraping-using-ruby","status":"publish","type":"post","link":"https:\/\/www.ipway.com\/blog\/web-scraping-using-ruby\/","title":{"rendered":"Web Scraping Using Ruby"},"content":{"rendered":"\n<p>Using web scraping has become crucial for companies and developers to collect data from the internet effectively. Ruby, recognized for its simplicity and readability is widely used for web scraping projects. This detailed guide will explore the effective methods and strategies for web scraping using Ruby encompassing everything from setup to extracting information, from dynamic web pages.<\/p>\n\n\n\n<p>Web scraping refers to the method of gathering data from websites enabling you to automate the extraction of amounts of information swiftly and effectively. With its syntax and robust libraries, Ruby proves to be a superb choice for web scraping tasks. This piece will walk you through the process of web scraping with Ruby covering everything, from initial setup to advanced strategies.<\/p>\n\n\n<div class=\"ub_table-of-contents\" data-showtext=\"show\" data-hidetext=\"hide\" data-scrolltype=\"auto\" id=\"ub_table-of-contents-054257c4-3b41-40a7-becd-6f1ef72940ae\" data-initiallyhideonmobile=\"false\"\n                    data-initiallyshow=\"true\"><div class=\"ub_table-of-contents-header-container\"><div class=\"ub_table-of-contents-header\">\n                    <div class=\"ub_table-of-contents-title\">Web Scraping Using Ruby<\/div><\/div><\/div><div class=\"ub_table-of-contents-extra-container\"><div class=\"ub_table-of-contents-container ub_table-of-contents-1-column \"><ul><li><a href=https:\/\/www.ipway.com\/blog\/web-scraping-using-ruby\/#0-is-ruby-good-for-web-scraping->Is Ruby Good for Web Scraping?<\/a><\/li><li><a href=https:\/\/www.ipway.com\/blog\/web-scraping-using-ruby\/#1-installing-ruby->Installing Ruby<\/a><ul><li><a href=https:\/\/www.ipway.com\/blog\/web-scraping-using-ruby\/#2-installing-ruby-on-windows->Installing Ruby on Windows<\/a><\/li><li><a href=https:\/\/www.ipway.com\/blog\/web-scraping-using-ruby\/#3-installing-ruby-on-macos->Installing Ruby on macOS<\/a><ul><li><a href=https:\/\/www.ipway.com\/blog\/web-scraping-using-ruby\/#4-alternative-method-for-macos>Alternative Method for macOS<\/a><\/li><\/ul><\/li><\/ul><\/li><li><a href=https:\/\/www.ipway.com\/blog\/web-scraping-using-ruby\/#5-best-gems-in-web-scraping-using-ruby->Best Gems in Web Scraping Using Ruby<\/a><ul><li><a href=https:\/\/www.ipway.com\/blog\/web-scraping-using-ruby\/#6-nokogiri->Nokogiri<\/a><\/li><li><a href=https:\/\/www.ipway.com\/blog\/web-scraping-using-ruby\/#7-httparty->HTTParty<\/a><\/li><li><a href=https:\/\/www.ipway.com\/blog\/web-scraping-using-ruby\/#8-mechanize->Mechanize<\/a><\/li><li><a href=https:\/\/www.ipway.com\/blog\/web-scraping-using-ruby\/#9-watir->Watir<\/a><\/li><li><a href=https:\/\/www.ipway.com\/blog\/web-scraping-using-ruby\/#10-kimurai->Kimurai<\/a><\/li><\/ul><\/li><li><a href=https:\/\/www.ipway.com\/blog\/web-scraping-using-ruby\/#11-scraping-static-pages->Scraping static pages<\/a><\/li><li><a href=https:\/\/www.ipway.com\/blog\/web-scraping-using-ruby\/#12-making-an-http-request->Making an HTTP Request<\/a><\/li><li><a href=https:\/\/www.ipway.com\/blog\/web-scraping-using-ruby\/#13-parsing-html-with-nokogiri->Parsing HTML with Nokogiri<\/a><ul><li><a href=https:\/\/www.ipway.com\/blog\/web-scraping-using-ruby\/#14-extracting-data->Extracting Data<\/a><\/li><\/ul><\/li><li><a href=https:\/\/www.ipway.com\/blog\/web-scraping-using-ruby\/#15-writing-scraped-data-to-a-csv-file->Writing Scraped Data to a CSV File<\/a><\/li><li><a href=https:\/\/www.ipway.com\/blog\/web-scraping-using-ruby\/#16-scraping-dynamic-pages->Scraping Dynamic Pages<\/a><\/li><li><a href=https:\/\/www.ipway.com\/blog\/web-scraping-using-ruby\/#17-required-installation->Required Installation<\/a><\/li><li><a href=https:\/\/www.ipway.com\/blog\/web-scraping-using-ruby\/#18-loading-a-dynamic-website->Loading a Dynamic Website<\/a><\/li><li><a href=https:\/\/www.ipway.com\/blog\/web-scraping-using-ruby\/#19-locating-html-elements-via-css-selectors->Locating HTML Elements via CSS Selectors<\/a><\/li><li><a href=https:\/\/www.ipway.com\/blog\/web-scraping-using-ruby\/#20-handling-pagination->Handling Pagination<\/a><\/li><li><a href=https:\/\/www.ipway.com\/blog\/web-scraping-using-ruby\/#21-creating-a-csv-file->Creating a CSV File<\/a><\/li><li><a href=https:\/\/www.ipway.com\/blog\/web-scraping-using-ruby\/#22-conclusion->Conclusion<\/a><\/li><\/ul><\/div><\/div><\/div>\n\n\n<h2 class=\"wp-block-heading\" id=\"0-is-ruby-good-for-web-scraping-\"><strong>Is Ruby Good for Web Scraping?<\/strong><\/h2>\n\n\n\n<p><a href=\"https:\/\/www.ruby-lang.org\/en\/\" target=\"_blank\" rel=\"noopener\">Ruby<\/a> is a programming language that is interpreted, open source and dynamically typed. It supports object oriented and procedural development. Ruby prioritizes simplicity with its syntax that is both easy to write and read naturally. This focus on efficiency has led to Ruby being used for various applications, including web scraping.<\/p>\n\n\n\n<p>The abundance of third party libraries in Ruby, known as &#8220;gems &#8221; makes it especially suitable for web scraping. These gems cover a range of tasks making it simple to download web pages analyze HTML content and extract data.<\/p>\n\n\n\n<p>To sum up conducting web scraping with Ruby is not possible but also uncomplicated thanks, to the numerous libraries available.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1680\" height=\"1120\" src=\"https:\/\/www.ipway.com\/blog\/wp-content\/uploads\/2024\/07\/Imagine-articol-scraping-with-ruby.jpg\" alt=\"web scraping using ruby\" class=\"wp-image-1285\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"1-installing-ruby-\"><strong>Installing Ruby<\/strong><\/h2>\n\n\n\n<p>Before you begin scraping data make sure to set up Ruby. Here are the step by step instructions, for installing Ruby on both Windows and macOS: <\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"2-installing-ruby-on-windows-\"><strong>Installing Ruby on Windows<\/strong><\/h3>\n\n\n\n<p><strong>Download RubyInstaller<\/strong>: <\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Visit the <a href=\"https:\/\/rubyinstaller.org\/\" target=\"_blank\" rel=\"noopener\">RubyInstaller website<\/a>.<\/li>\n\n\n\n<li>Be sure to get the suggested edition of Ruby. The installation package comes with the Ruby programming language, RubyGems and a built in development environment.<\/li>\n<\/ul>\n\n\n\n<p><strong>Run the Installer<\/strong>: <\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Please click on the file you just downloaded to begin the installation process.<\/li>\n\n\n\n<li>Make sure to follow the instructions displayed on the screen. Remember to tick the box that mentions &#8220;Include Ruby executables in your PATH&#8221; while installing. This is a step, for using Ruby through the command line.<\/li>\n<\/ul>\n\n\n\n<p><strong>Verify Installation<\/strong>: <\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>To open Command Prompt press the Windows key and R together type &#8220;cmd,&#8221;. Then press Enter.<\/li>\n\n\n\n<li>Make sure to type &#8216;ruby v&#8217;. Hit Enter. If you see the installed version of Ruby displayed it means that Ruby has been installed correctly on your system.<\/li>\n<\/ul>\n\n\n\n<p><strong>Update RubyGems<\/strong>:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>RubyGems serves as the go to package manager for Ruby. While it typically comes pre installed you have the option to update it by executing the specified command, in Command Prompt: <\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code>gem update --system\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"3-installing-ruby-on-macos-\"><strong>Installing Ruby on macOS<\/strong><\/h3>\n\n\n\n<p><strong>Using Homebrew<\/strong>:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Homebrew is a tool, for macOS that helps you easily install software. If you don&#8217;t already have Homebrew, open Terminal and enter the following command: <\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code>\/bin\/bash -c \"$(curl -fsSL https:\/\/raw.githubusercontent.com\/Homebrew\/install\/HEAD\/install.sh)\"\n<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Once Homebrew is installed, install Ruby by running: <\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code>brew install ruby\n<\/code><\/pre>\n\n\n\n<p><strong>Verify Installation<\/strong>: <\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Open Terminal and type: <\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code>ruby -v\n<\/code><\/pre>\n\n\n\n<p>To confirm that Ruby has been installed on your macOS system execute this command to display the installed Ruby version.<\/p>\n\n\n\n<p><strong>Update RubyGems<\/strong>:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Similar to Windows RubyGems serves as the package manager, for Ruby. Is already included in the setup. To perform an update simply execute: <\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code>gem update --system\n<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"4-alternative-method-for-macos\">Alternative Method for macOS<\/h4>\n\n\n\n<p><strong>Using Ruby Version Manager (RVM)<\/strong>:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>RVM is an used approach, for setting up Ruby enabling you to handle various Ruby versions. To install RVM execute the provided command in your Terminal: <\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code>\\curl -sSL https:\/\/get.rvm.io | bash -s stable\n<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>After installation, load RVM into your shell session:<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code>source ~\/.rvm\/scripts\/rvm\n<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Install the latest version of Ruby using RVM: <\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code>rvm install ruby\n<\/code><\/pre>\n\n\n\n<p><strong>Verify Installation<\/strong>:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Check the installed Ruby version: <\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code>ruby -v\n<\/code><\/pre>\n\n\n\n<p><strong>Update RubyGems<\/strong>: <\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>As with other methods, update RubyGems if necessary:<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code>gem update --system\n<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"5-best-gems-in-web-scraping-using-ruby-\"><strong>Best Gems in Web Scraping Using Ruby<\/strong><\/h2>\n\n\n\n<p>Ruby&#8217;s environment comprises a variety of gems that enhance the efficiency and ease of web scraping. These gems offer functionalities like handling HTTP requests, parsing HTML content, cookie management and more. Below are some recommended Ruby gems for web scraping that you might find beneficial: <\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"6-nokogiri-\"><strong>Nokogiri<\/strong><\/h3>\n\n\n\n<p><a href=\"https:\/\/nokogiri.org\/index.html\" target=\"_blank\" rel=\"noreferrer noopener\">Nokogiri<\/a> stands out as the go to gem in the Ruby community for parsing through HTML and XML. With its capability to navigate documents using CSS selectors and XPath it proves to be a tool for fetching information, from web pages.<\/p>\n\n\n\n<p><strong>Installation<\/strong>:<\/p>\n\n\n\n<p>Install Nokogiri by running: <\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>gem install nokogiri\n<\/code><\/pre>\n\n\n\n<p><strong>Usage<\/strong>: <\/p>\n\n\n\n<p>Here\u2019s a basic example of how to use Nokogiri:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>require 'nokogiri'\nrequire 'httparty'\n\nurl = 'https:\/\/example.com'\nresponse = HTTParty.get(url)\nparsed_page = Nokogiri::HTML(response.body)\n\ntitles = parsed_page.css('h1').map(&amp;:text)\nputs titles\n<\/code><\/pre>\n\n\n\n<p>This script retrieves the HTML content, from the given URL analyzes it and retrieves the text contained within all<br>tags.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"7-httparty-\"><strong>HTTParty<\/strong><\/h3>\n\n\n\n<p><a href=\"https:\/\/rubygems.org\/gems\/httparty\/versions\/0.13.7?locale=en\" target=\"_blank\" rel=\"noreferrer noopener\">HTTParty<\/a> is a tool that streamlines the process of sending HTTP requests. It&#8217;s user friendly. Works seamlessly with other popular Ruby gems such, as Nokogiri.<\/p>\n\n\n\n<p><strong>Installation<\/strong>: <\/p>\n\n\n\n<p>Install HTTParty by running:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>gem install httparty\n<\/code><\/pre>\n\n\n\n<p><strong>Usage<\/strong>: <\/p>\n\n\n\n<p>Here is a simple illustration of how to send an HTTP GET request.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>require 'httparty'\n\nurl = 'https:\/\/example.com'\nresponse = HTTParty.get(url)\n\nif response.success?\n  puts response.body\nelse\n  puts \"Failed to retrieve the webpage\"\nend\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"8-mechanize-\"><strong>Mechanize<\/strong><\/h3>\n\n\n\n<p><a href=\"https:\/\/rubygems.org\/gems\/mechanize\" target=\"_blank\" rel=\"noreferrer noopener\">Mechanize<\/a> is a tool designed to automate website interactions, managing cookies, sessions and form submissions. It&#8217;s handy, for extracting data from pages that need login credentials or other types of engagement.<\/p>\n\n\n\n<p><strong>Installation<\/strong>: <\/p>\n\n\n\n<p>Install Mechanize by running:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>gem install mechanize\n<\/code><\/pre>\n\n\n\n<p><strong>Usage<\/strong>:<\/p>\n\n\n\n<p>Here is a simple illustration of how to utilize Mechanize for extracting data from a website: <\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>require 'mechanize'\n\nagent = Mechanize.new\npage = agent.get('https:\/\/example.com')\nputs page.title\n\npage.links.each do |link|\n  puts link.text\nend\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"9-watir-\"><strong>Watir<\/strong><\/h3>\n\n\n\n<p>Watir, also known as Web Application Testing in Ruby, is a tool for automating web browsers. It comes in handy when extracting content that relies on the execution of JavaScript.<\/p>\n\n\n\n<p><strong>Installation<\/strong>:<\/p>\n\n\n\n<p>Install Watir and a web driver like Selenium by running:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>gem install watir\ngem install selenium-webdriver\n<\/code><\/pre>\n\n\n\n<p><strong>Usage<\/strong>:<\/p>\n\n\n\n<p>Here\u2019s a basic example of using Watir to scrape a dynamic website:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>require 'watir'\n\nbrowser = Watir::Browser.new :chrome\nbrowser.goto 'https:\/\/example.com'\n\nputs browser.title\nbrowser.close\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"10-kimurai-\"><strong>Kimurai<\/strong><\/h3>\n\n\n\n<p>Kimurai represents a web scraping framework designed for Ruby utilizing Nokogiri, Watir and Capybara as its foundation. With its user interface for overseeing various spiders it serves as a robust solution, for intricate scraping assignments.<\/p>\n\n\n\n<p><strong>Installation<\/strong>: <\/p>\n\n\n\n<p>To set up Kimurai simply include it in your Gemfile. Then run bundle install.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>gem 'kimurai'\n<\/code><\/pre>\n\n\n\n<p><strong>Usage<\/strong>:<\/p>\n\n\n\n<p>Here\u2019s a basic example of a Kimurai spider:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>require 'kimurai'\n\nclass ExampleSpider &lt; Kimurai::Base\n  @name = 'example_spider'\n  @start_urls = &#91;'https:\/\/example.com']\n  @engine = :mechanize\n\n  def parse(response, url:, data: {})\n    response.css('h1').each do |heading|\n      puts heading.text\n    end\n  end\nend\n\nExampleSpider.crawl!<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"11-scraping-static-pages-\"><strong>Scraping static pages<\/strong><\/h2>\n\n\n\n<p>Static web pages contain content that is coded directly into the HTML source making them simpler to extract compared to pages. Now lets dive into the process of scraping web pages with Ruby, which includes sending HTTP requests analyzing HTML code and managing the extracted data.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"12-making-an-http-request-\"><strong>Making an HTTP Request<\/strong><\/h2>\n\n\n\n<p>When you want to scrape information from a webpage the initial thing to do is send an HTTP request to the specific URL. To accomplish this task in Ruby we rely on the HTTParty gem, which&#8217;s a handy tool, for handling HTTP requests. <\/p>\n\n\n\n<p><strong>Install HTTParty<\/strong>: To set up HTTParty open your terminal. Execute the command provided below. <\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>gem install httparty\n<\/code><\/pre>\n\n\n\n<p><strong>Make a Request<\/strong>: Lets make a Ruby script file, like &#8220;scrape_static.rb &#8220;. Add this code to send an HTTP GET request;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>require 'httparty'\n\nurl = 'https:\/\/example.com'\nresponse = HTTParty.get(url)\n\nif response.success?\n  puts response.body\nelse\n  puts \"Failed to retrieve the webpage\"\nend\n<\/code><\/pre>\n\n\n\n<p>This program retrieves the HTML content, from the given URL. Displays it on the screen.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"13-parsing-html-with-nokogiri-\"><strong>Parsing HTML with Nokogiri<\/strong><\/h2>\n\n\n\n<p>After obtaining the HTML content the next step is to analyze it to retrieve the information you&#8217;re looking for. Nokogiri stands out as a tool, for parsing through HTML and XML documents using Ruby programming language.<\/p>\n\n\n\n<p><strong>Install Nokogiri<\/strong>: <\/p>\n\n\n\n<p>Install Nokogiri by running: <\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Install Nokogiri by running:<\/code><\/pre>\n\n\n\n<p><strong>Parse HTML<\/strong>: <\/p>\n\n\n\n<p>Add the following code to your script to parse the HTML using Nokogiri: <\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>require 'nokogiri'\nrequire 'httparty'\n\nurl = 'https:\/\/example.com'\nresponse = HTTParty.get(url)\n\nif response.success?\n  parsed_page = Nokogiri::HTML(response.body)\n  puts parsed_page.title\nelse\n  puts \"Failed to retrieve the webpage\"\nend\n<\/code><\/pre>\n\n\n\n<p>This piece of code retrieves the HTML content, from the webpage. Uses Nokogiri to analyze it. It then. Displays the title of the page.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"14-extracting-data-\"><strong>Extracting Data<\/strong><\/h3>\n\n\n\n<p>You have the ability to utilize Nokogiris CSS selectors to find and extract elements from the HTML document.<\/p>\n\n\n\n<p><strong>Locate Elements<\/strong>: <\/p>\n\n\n\n<p>For instance to gather all the headings (designated as &lt;h1&gt; elements) from the webpage: <\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>require 'nokogiri'\nrequire 'httparty'\n\nurl = 'https:\/\/example.com'\nresponse = HTTParty.get(url)\n\nif response.success?\n  parsed_page = Nokogiri::HTML(response.body)\n  headings = parsed_page.css('h1')\n  headings.each do |heading|\n    puts heading.text\n  end\nelse\n  puts \"Failed to retrieve the webpage\"\nend\n<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"15-writing-scraped-data-to-a-csv-file-\"><strong>Writing Scraped Data to a CSV File<\/strong><\/h2>\n\n\n\n<p>Once you&#8217;ve gathered the information you may consider saving it in a CSV document. Ruby\u2019s built-in CSV library makes this easy:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>require 'csv'\n\nCSV.open(\"data.csv\", \"w\") do |csv|\n  csv &lt;&lt; &#91;\"Title\", \"URL\"]\n  csv &lt;&lt; &#91;\"Example Title\", \"https:\/\/example.com\"]\nend\n<\/code><\/pre>\n\n\n\n<p>This program generates a CSV file. Saves the information into it.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"16-scraping-dynamic-pages-\"><strong>Scraping Dynamic Pages<\/strong><\/h2>\n\n\n\n<p>Creating web pages with JavaScript involves more complex methods for data extraction. Here&#8217;s a guide, on navigating them with Watir.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"17-required-installation-\"><strong>Required Installation<\/strong><\/h2>\n\n\n\n<p>First, you need to install Watir and a web driver like Selenium:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>gem install watir\ngem install selenium-webdriver\n<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"18-loading-a-dynamic-website-\"><strong>Loading a Dynamic Website<\/strong><\/h2>\n\n\n\n<p>Here\u2019s how to load a dynamic website using Watir:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>require 'watir'\n\nbrowser = Watir::Browser.new :chrome\nbrowser.goto 'https:\/\/example.com'\n\nputs browser.title\nbrowser.close\n<\/code><\/pre>\n\n\n\n<p>This code will launch the website using the Chrome browser and display the title.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"19-locating-html-elements-via-css-selectors-\"><strong>Locating HTML Elements via CSS Selectors<\/strong><\/h2>\n\n\n\n<p>You can locate and interact with HTML elements using CSS selectors:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>require 'watir'\n\nbrowser = Watir::Browser.new :chrome\nbrowser.goto 'https:\/\/example.com'\n\nelement = browser.element(css: 'h1')\nputs element.text\n\nbrowser.close\n<\/code><\/pre>\n\n\n\n<p>This code snippet retrieves the content of the<br>tag present, on the webpage.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"20-handling-pagination-\"><strong>Handling Pagination<\/strong><\/h2>\n\n\n\n<p>Numerous websites utilize pagination to present sets of data. Here is a guide, on managing it: <\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>require 'watir'\n\nbrowser = Watir::Browser.new :chrome\nbrowser.goto 'https:\/\/example.com'\n\nloop do\n  puts browser.text\n  next_button = browser.button(text: 'Next')\n  break unless next_button.exists?\n\n  next_button.click\n  sleep 2 # wait for the page to load\nend\n\nbrowser.close\n<\/code><\/pre>\n\n\n\n<p>The program will go through the pages by clicking the &#8220;Next&#8221; button until it&#8217;s no longer available.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"21-creating-a-csv-file-\"><strong>Creating a CSV File<\/strong><\/h2>\n\n\n\n<p>Finally, save the scraped data to a CSV file:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>require 'csv'\nrequire 'watir'\n\nbrowser = Watir::Browser.new :chrome\nbrowser.goto 'https:\/\/example.com'\n\nCSV.open(\"dynamic_data.csv\", \"w\") do |csv|\n  csv &lt;&lt; &#91;\"Content\"]\n\n  loop do\n    content = browser.text\n    csv &lt;&lt; &#91;content]\n\n    next_button = browser.button(text: 'Next')\n    break unless next_button.exists?\n\n    next_button.click\n    sleep 2 # wait for the page to load\n  end\nend\n\nbrowser.close\n<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"22-conclusion-\"><strong>Conclusion<\/strong><\/h2>\n\n\n\n<p>Using Ruby for web scraping is a method to collect information from websites. The simplicity of Ruby and the presence of gems such as Nokogiri, HTTParty, Mechanize and Watir make it a fantastic option for newcomers and seasoned developers alike. Whether you&#8217;re extracting data, from dynamic pages Ruby equips you with the necessary tools to complete the task effectively. Dive into the realm of web scraping using Ruby and harness the capabilities of automated data retrieval.<\/p>\n\n\n\n<p><a href=\"https:\/\/www.ipway.com\/\">Take your data scraping to the next level with IPWAY\u2019s datacenter proxies<\/a>!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Using web scraping has become crucial for companies and developers to collect data from the internet effectively. Ruby, recognized for its simplicity and readability is widely used for web scraping projects. This detailed guide will explore the effective methods and strategies for web scraping using Ruby encompassing everything from setup to extracting information, from dynamic&hellip; <a class=\"more-link\" href=\"https:\/\/www.ipway.com\/blog\/web-scraping-using-ruby\/\">Continue reading <span class=\"screen-reader-text\">Web Scraping Using Ruby<\/span><\/a><\/p>\n","protected":false},"author":6,"featured_media":1284,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"ub_ctt_via":"","footnotes":""},"categories":[25],"tags":[],"class_list":["post-1275","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-what-is","entry"],"featured_image_src":"https:\/\/www.ipway.com\/blog\/wp-content\/uploads\/2024\/07\/Coperta-Articol-Scraping-with-Ruby.jpg","author_info":{"display_name":"Roxana Anghel","author_link":"https:\/\/www.ipway.com\/blog\/author\/roxana-editor\/"},"_links":{"self":[{"href":"https:\/\/www.ipway.com\/blog\/wp-json\/wp\/v2\/posts\/1275","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.ipway.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.ipway.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.ipway.com\/blog\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/www.ipway.com\/blog\/wp-json\/wp\/v2\/comments?post=1275"}],"version-history":[{"count":11,"href":"https:\/\/www.ipway.com\/blog\/wp-json\/wp\/v2\/posts\/1275\/revisions"}],"predecessor-version":[{"id":1288,"href":"https:\/\/www.ipway.com\/blog\/wp-json\/wp\/v2\/posts\/1275\/revisions\/1288"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.ipway.com\/blog\/wp-json\/wp\/v2\/media\/1284"}],"wp:attachment":[{"href":"https:\/\/www.ipway.com\/blog\/wp-json\/wp\/v2\/media?parent=1275"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.ipway.com\/blog\/wp-json\/wp\/v2\/categories?post=1275"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.ipway.com\/blog\/wp-json\/wp\/v2\/tags?post=1275"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}