Iterating over a dictionary in python and stripping white space

I am working with the web scraping framework Scrapy and I am wondering how do I iterate over all of the scraped items which seem to be in a dictionary and strip the white space from each one. Here...

Excel VBA: Wait for JavaScript execution in Internet Explorer

I am trying to do some web scraping in Excel VBA. Here is the part of the code that I am having trouble with: IE.Navigate URL Do DoEvents Loop While IE.ReadyState <> 4 Or IE.Busy = True Set doc...

Converting html to text with Python

I am trying to convert an html block to text using Python. Input: <div class="body"><p><strong></strong></p> <p><strong></strong>Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean...

Web scraping - how to access content rendered in JavaScript via Angular.js?

I'm trying to scrape data from the public site asx.com.au The page http://www.asx.com.au/asx/research/company.do#%21/ACB/details contains a div with class 'view-content', which has the information...

Can a Telegram bot read messages of channel

Can a telegram bot read/access a telegram channel that neither I or the bot is administrator of? I know that up to last November it was not possible, but I have heard some people have done this,...

Web-scraping of mobile apps?

Is there any program/library available that can scrape the contents of an mobile apps' screen? The goal is to have a nice data structure for the Instagram "Following" feed.

Getting "Too many requests" error when scraping a particular website using scrapy

I have written a spider to fetch details from http://allevents.in. Every time I tried to scrap, I am getting a response body Too many requests, please try after some time or report this problem at...

Provided og:image url encountered an unknown error

Yesterday I attempted to share a web page on Facebook, specifically https://share.novamanus.com/ad/659, however this did not preview correctly on Facebook. When I attempted to scrape this with the...

Noaa Api with Python. Downloaded the datasets, how will I open them?

I have tried to access datasets from NOAA, for a project. I have been able to download the json file but I do not know how to open the desired file I have printed out. url =...

Scraping: SSL: CERTIFICATE_VERIFY_FAILED error for http://en.wikipedia.org

I'm practicing the code from 'Web Scraping with Python', and I keep having this certificate problem: from urllib.request import urlopen from bs4 import BeautifulSoup import re pages = set() def...

extract the number of results from google search

I am writing a web scraper to extract the number of results of searching in a google search which appears on the top left of the page of search results. I have written the code below but I do not...

How I can clear scrapy jobs list?

How I can clear scrapy jobs list? When I start any spider I have a lot jobs with specific spider and I know how can I kill all them ? After reading documentation I have done next code, which I run...

minimize window driver selenium excel vba

I have searched a lot for a way to minimize the window of the driver in selenium for excel vba. I have found ways for Java and python and tried to adopt them but all my tries failed I just found a...

Scrape posts and comments from a public facebook page

It looks like sometime last year, Facebook started severely restricting its Graph API. I've tried different ways to scrape a public facebook page but always get the error: HTTP Error 400: Bad...

How to fix "mapping values are not allowed in this context " error in yaml file?

I've browsed similar questions and believe i've applied all that i've been able to glean from answers. I have a .yml file where as far as I can tell each element is formatted identically. And yet...

How to proceed when redirected to page after successful sign in with POST method

I have signed in a website using R 3.5.2, and this seems to be gone well both using rvest_0.3.4 and httr_1.4.0, but then I get stuck into a redirecting page which, on the browser (Chrome), is...

How to use this Datepicker with Puppeteer

I would like to crawl flight data from the following page: https://www.airprishtina.com/de/ I managed to select the airports, but this page has a Datepicker and I don't get it how to use it...

Google Query Language: How to select the min value?

I am doing some web scraping task, and I get prices from a website. The issue is that I would like to get the min between all options. For example: It will looks for one cellphone , which has 8GB...

pd.read_csv produces HTTPError: HTTP Error 403: Forbidden

When I look up my issue on Google or Stackoverflow, there seem to be half a dozen cases like this solved, however I never really seem to understand the solution. So I wand to scrape a .csv from a...

Shortcomings of Newspaper3k: How to Scrape ONLY Article HTML? Python

Hello and thank you kindly for your help, I've been using Python and Newspaper3k to scrape websites, but I've noticed that some functions are ...well... not functional. In particular, I've only...

Python Requests Library - Scraping separate JSON and HTML responses from POST request

I'm new to web scraping, programming, and StackOverflow, so I'll try to phrase things as clearly as I can. I'm using the Python requests library to try to scrape some info from a local movie...

Need help scraping a web-page

I started a mini project where I want to retrieve coin name, price, coin market-cap, circulating supply and volume for the first 100 coins on the first page. Until now, (after asking several...

How to complete geetest (captcha) when scraping, by python-requests, while request values are taken by solving captcha manually?

I'm trying to scrape website, which use datadome and after some requests I have to complete geetest (slider captcha puzzle). Here is a sample link to it: captcha link I've decided to don't use...

How to bypass human verification 'press and hold' using Selenium in Python?

I am trying to scrape some product reviews using Selenium and Python from this site but it connects another site and shows a popup at any point randomly, where I need to press and hold the button...

Scraping only the portion that loads - Without Scrolling

I have written a simple web scraping code using Selenium but I want to scrape only the portion that is present 'before scroll' Say, if it is this page I want to scrape -...

`cannot connect to chrome at 127.0.0.1:37541` when using undetected-chromedriver with Python

After using Selenium, I decided to try undetected-chromedriver so I installed it using pip install undetected-chromedriver However, running this simple script import undetected_chromedriver.v2...

How to track whatsapp online status of unsaved contacts programatically

Over searching, I found various solutions such as - 1 - Web Scraping using selenium but that is a very inefficient way to track multiple contacts 2 - Using store object. That was one of the best...

How can I send Dynamic website content to scrapy with the html content generated by selenium browser?

I am working on certain stock relate project where I have task to scrape all data of daily base for last 5 years. i.e from 2016 to till date. I particularly thought of using selenium because I...

How to query graphql issuing post requests with json parameters using google apps script?

I've been trying to scrape the first column from the table of this webpage using google apps script. When I observe network activity in dev tools, I could notice that I have to send post http...

Failed to log in to a website to scrape my profile name using apps script

I've been trying to log in to this website using my credentials in order to scrape my profile name using google apps script. The status code is 200 and I can see that the script is able to get...