
In my unit tests, I demand it be at least 10 times faster than Symfony's DOMCrawler on a 3Mb HTML document. You can use the familiar jQuery/CSS selector syntax to easily find the data you need. To see the official documentation for requests_html, click here.An extremely fast and efficient web scraper that can parse megabytes of invalid HTML in a blink of an eye. That’s it for this post! To learn more about requests-html, check out my web scraping course on Udemy here! Note: here we don’t need to convert each date to a Unix timestamp as these functions will figure that out automatically from the input dates.įrom yahoo_fin.options import get_calls, get_putsĬalls_data = In this case, we just input the ticker symbol, NFLX and associated expiration date into either get_calls or get_puts to obtain the calls and puts data, respectively.

Similarly, we could scrape this data using yahoo_fin. Puts_data = dict(zip(dates, for df in info])) Info = Ĭalls_data = dict(zip(dates, for df in info])) This can be done using the pandas package. In this particular case, the pattern of the URL for each expiration date’s data requires the date be converted to Unix timestamp format. Once we have the expiration dates, we could proceed with scraping the data associated with each date. Scraping options data for each expiration date Lastly, we could scrape this particular webpage directly with yahoo_fin, which provides functions that wrap around requests_html specifically for Yahoo Finance’s website.įrom yahoo_fin.options import get_expiration_dates However, the awesome point here is that we can create the connection to this webpage, render its JavaScript, and parse out the resultant HTML all in one package! anchor (a), paragraph (p), header tags (h1, h2, h3, etc.) and so on.Īlternatively, we could also use BeautifulSoup on the rendered HTML (see below). Similarly, if we wanted to search for other HTML tags we could just input whatever those are into the find method e.g. From here, we can parse out the expiration dates from these tags using the find method.ĭates =

So now contains the HTML we need containing the option tags. Specifically, we can access the rendered HTML like this: Stores the updated HTML as in attribute in resp.html. Note how we don’t need to set a variable equal to this rendered result i.e. To simulate running the JavaScript code, we use the render method on the resp.html object. Running resp.html will give us an object that allows us to print out, search through, and perform several functions on the webpage’s HTML. If you print out resp you should see the message Response 200, which means the connection to the webpage was successful (otherwise you’ll get a different message). This gets stored in a response variable, resp. Similar to the requests package, we can use a session object to get the webpage we need. # Use the object above to connect to needed webpage
WEBSCRAPER PHP CODE
Now, let’s use requests_html to run the JavaScript code in order to render the HTML we’re looking for. This means if we try just scraping the HTML, the JavaScript won’t be executed, and thus, we won’t see the tags containing the expiration dates. it modifies the HTML of the page dynamically to allow a user to select one of the possible expiration dates. Why the disconnect? The reason why we see option tags when looking at the source code in a browser is that the browser is executing JavaScript code that renders that HTML i.e. However, if we look at the source via a web browser, we can see that there are, indeed, option tags:

This is because there are no option tags found in the HTML we scrapped from the webpage above. Running the above code shows us that option_tags is an empty list. To demonstrate, let’s try doing that to see what happens. We can try using requests with BeautifulSoup, but that won’t work quite the way we want.

What if we want to get all the possible choices – i.e. On this webpage there’s a drop-down box allowing us to view data by other expiration dates. If we go to the below site, we can see the option chain information for the earliest upcoming options expiration date for Netflix: As an example, let’s look at Netflix (since it’s well known). Let’s say we want to scrape options data for a particular stock.
