Different behaviour of GET #3608

holtgrewe · 2025-07-15T08:13:09Z

holtgrewe
Jul 15, 2025

Dear all,

I'm observing a weird behaviour of a (public) site when I use httpx to access it.

I'm essentially emulating a browser accessing https://pubmed.ncbi.nlm.nih.gov/35642643/?format=pubmed.

Curl works

curl https://pubmed.ncbi.nlm.nih.gov/35642643/?format=pubmed

As does requests

import requests
result = requests.get('https://pubmed.ncbi.nlm.nih.gov/35642643/?format=pubmed')
print(f"Status: {result.status_code}")
print(result.text)

With httpx 0.28.1, I get a 403.

import httpx
with httpx.Client() as client:
    response = client.get('https://pubmed.ncbi.nlm.nih.gov/35642643/?format=pubmed')
    print(f"Status: {response.status_code}")
    print(response.text)

This yields

Status: 403
<!doctype html><meta charset="utf-8"><meta name=viewport content="width=device-width, initial-scale=1"><title>403</title>403 Forbidden

I have not been able to figure out the problem even after dumping raw headers etc. in httpx.

FWIW, the request also works with aiohttp.

I'd be happy about any input/help. I'm not affiliated with the pubmed/ncbi/nlm/nih site, I'm just trying to access it with httpx.

Answered by lovelydinosaur

Jul 15, 2025

Ah, that's a little bit frustrating. 😬

It's unclear why the server is differentiating httpx and returning a 403 response in this case. Either the server or the gateway has presumably been configured to disallow httpx clients here.

Perhaps they've been spammed in the past by crawlers using httpx, and have a block in place as a result? I did try setting the request headers inc. User-Agent here, though there's evidently some more complex client fingerprinting in place.

Incidentally, there might be some useful default behaviours that we could build into httpx in order to help ensure that it's generally used as a well behaved client. Eg. respecting Retry-After for default rate limiting, perhap…

View full answer

lovelydinosaur · 2025-07-15T09:46:44Z

lovelydinosaur
Jul 15, 2025
Maintainer

Ah, that's a little bit frustrating. 😬

It's unclear why the server is differentiating httpx and returning a 403 response in this case. Either the server or the gateway has presumably been configured to disallow httpx clients here.

Perhaps they've been spammed in the past by crawlers using httpx, and have a block in place as a result? I did try setting the request headers inc. User-Agent here, though there's evidently some more complex client fingerprinting in place.

Incidentally, there might be some useful default behaviours that we could build into httpx in order to help ensure that it's generally used as a well behaved client. Eg. respecting Retry-After for default rate limiting, perhaps???

0 replies

holtgrewe · 2025-07-16T18:03:32Z

holtgrewe
Jul 16, 2025
Author

Here is the answer from the PubMed Helpdesk. It does not directly answer the question.

Thank you for writing to the help desk.

Please do not use the PubMed web interface for programmatic retrieval of citation data. Our E-utilities API is designed for this purpose. There is documentation available at:

Quick Start: https://www.ncbi.nlm.nih.gov/books/NBK25500/
E-utilities in Depth: https://www.ncbi.nlm.nih.gov/books/NBK25499/

For example, the URL below will retrieve XML for the record that you are trying to access from the web interface: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=35642643&retmode=xml&rettype=text

This is one simple example of an E-utilities URL. Please see the documentation to learn more.

Kind regards,

PubMed Team

National Center for Biotechnology Information

National Library of Medicine

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Different behaviour of GET #3608

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Different behaviour of GET #3608

Uh oh!

holtgrewe Jul 15, 2025

Replies: 2 comments

Uh oh!

lovelydinosaur Jul 15, 2025 Maintainer

Uh oh!

holtgrewe Jul 16, 2025 Author

holtgrewe
Jul 15, 2025

lovelydinosaur
Jul 15, 2025
Maintainer

holtgrewe
Jul 16, 2025
Author