Detailed explanation of code interaction with BigQuery using Python

Choose the right Python library

When using BigQuery, you can choose the following three Python libraries according to your needs:

BigQuery DataFrame: Through server-side processing, it supports Pandas and Scikit-learn APIs, suitable for data processing and machine learning tasks.
pandas-gbq: Client library for reading and writing BigQuery data in Python, suitable for simple data processing and analysis.
google-cloud-bigquery: A library maintained by Google that provides complete BigQuery API capabilities for complex data management and analysis.

Install the library

To use these libraries, you need to install the following packages:

pip install --upgrade pandas-gbq 'google-cloud-bigquery[bqstorage,pandas]'

Run the query

Using GoogleSQL Syntax

The following example shows how to usepandas-gbqandgoogle-cloud-bigqueryRun GoogleSQL query:

pandas-gbq

import pandas

sql = """
    SELECT name
    FROM `bigquery-public-data.usa_names.usa_1910_current`
    WHERE state = 'TX'
    LIMIT 100
"""

# Use standard SQL querydf = pandas.read_gbq(sql, dialect="standard")

# Specify the project IDproject_id = "your-project-id"
df = pandas.read_gbq(sql, project_id=project_id, dialect="standard")

google-cloud-bigquery

from  import bigquery

client = ()
sql = """
    SELECT name
    FROM `bigquery-public-data.usa_names.usa_1910_current`
    WHERE state = 'TX'
    LIMIT 100
"""

# Use standard SQL querydf = (sql).to_dataframe()

# Specify the project IDproject_id = "your-project-id"
df = (sql, project=project_id).to_dataframe()

Using legacy SQL syntax

If you need to use legacy SQL syntax, you can do it in the following ways:

pandas-gbq

import pandas

sql = """
    SELECT name
    FROM [bigquery-public-data:usa_names.usa_1910_current]
    WHERE state = 'TX'
    LIMIT 100
"""

df = pandas.read_gbq(sql, dialect="legacy")

google-cloud-bigquery

from  import bigquery

client = ()
sql = """
    SELECT name
    FROM [bigquery-public-data:usa_names.usa_1910_current]
    WHERE state = 'TX'
    LIMIT 100
"""
query_config = (use_legacy_sql=True)

df = (sql, job_config=query_config).to_dataframe()

Accelerate data downloads using the BigQuery Storage API

The BigQuery Storage API can significantly improve download speeds for large results. The following example shows how to use this API:

pandas-gbq

import pandas

sql = "SELECT * FROM `bigquery-public-data.irs_990.irs_990_2012`"

# Use BigQuery Storage API to speed up downloadsdf = pandas.read_gbq(sql, dialect="standard", use_bqstorage_api=True)

google-cloud-bigquery

from  import bigquery

client = ()
sql = "SELECT * FROM `bigquery-public-data.irs_990.irs_990_2012`"

# If the BigQuery Storage API is enabled, use it automaticallydf = (sql).to_dataframe()

Configure query

Parameterized query

The following example shows how to use parameterized queries:

pandas-gbq

import pandas

sql = """
    SELECT name
    FROM `bigquery-public-data.usa_names.usa_1910_current`
    WHERE state = @state
    LIMIT @limit
"""
query_config = {
    "query": {
        "parameterMode": "NAMED",
        "queryParameters": [
            {
                "name": "state",
                "parameterType": {"type": "STRING"},
                "parameterValue": {"value": "TX"},
            },
            {
                "name": "limit",
                "parameterType": {"type": "INTEGER"},
                "parameterValue": {"value": 100},
            },
        ],
    }
}

df = pandas.read_gbq(sql, configuration=query_config)

google-cloud-bigquery

from  import bigquery

client = ()
sql = """
    SELECT name
    FROM `bigquery-public-data.usa_names.usa_1910_current`
    WHERE state = @state
    LIMIT @limit
"""
query_config = (
    query_parameters=[
        ("state", "STRING", "TX"),
        ("limit", "INTEGER", 100),
    ]
)

df = (sql, job_config=query_config).to_dataframe()

Load pandas DataFrame into BigQuery table

The following example shows how to load a pandas DataFrame into a BigQuery table:

pandas-gbq

import pandas

df = (
    {
        "my_string": ["a", "b", "c"],
        "my_int64": [1, 2, 3],
        "my_float64": [4.0, 5.0, 6.0],
        "my_timestamp": [
            ("1998-09-04T16:03:14"),
            ("2010-09-13T12:03:45"),
            ("2015-10-02T16:00:00"),
        ],
    }
)
table_id = "my_dataset.new_table"

df.to_gbq(table_id)

google-cloud-bigquery

from  import bigquery
import pandas

df = (
    {
        "my_string": ["a", "b", "c"],
        "my_int64": [1, 2, 3],
        "my_float64": [4.0, 5.0, 6.0],
        "my_timestamp": [
            ("1998-09-04T16:03:14"),
            ("2010-09-13T12:03:45"),
            ("2015-10-02T16:00:00"),
        ],
    }
)
client = ()
table_id = "my_dataset.new_table"

# Ensure the correct data typejob_config = (
    schema=[
        ("my_string", "STRING"),
    ]
)

job = client.load_table_from_dataframe(df, table_id, job_config=job_config)

# Wait for the loading to complete()

Limitations of pandas-gbq

Dataset Management: Creating, updating, or deleting datasets is not supported.
Data format support: Only CSV format is supported, and nested values or array values are not supported.
Table management: Listing, copying, or deleting tables is not supported.
Data Export: Direct export of data to Cloud Storage is not supported.

Resolve connection pooling errors

If you encounter a connection pool error, you can increase the connection pool size by:

import requests

client = ()
adapter = (pool_connections=128, pool_maxsize=128, max_retries=3)
client._http.mount("https://", adapter)
client._http._auth_request.("https://", adapter)

The above is the detailed explanation of the code that uses Python to interact with BigQuery. For more information about the interaction between Python and BigQuery, please follow my other related articles!