Supported Queries
Steven M. Mortimer
2020-07-11
Source:vignettes/supported-queries.Rmd
supported-queries.Rmd
Overview
The following vignette outlines the different types of queries that have been documented and tested. These are the “supported” query types that the {salesforcer} package currently handles. If you run into an issue, please submit the issue HERE in the GitHub repository so that we can fix or add your query type to this list. Thank you!
Note: Salesforce’s proprietary form of SQL called SOQL (Salesforce Object Query Language) is a powerful tool that allows you to return the fields of records in almost any object in Salesforce. This includes standard objects like Accounts, Contacts, and Tasks along with any custom objects and custom fields created in your Org. You are encouraged to use Bulk APIs when:
- You anticipate returning 10,000 records or more
- Your query does not involve a parent-to-child nested relationship query
- You would like to reduce the overall number of API calls to your Org
If you are not familiar with SOQL, then please consider reading the following resources:
-
Introduction
to SOQL and SOSL
-
SOQL
SELECT Syntax
-
Relationship
Queries
- Async SOQL
- Overview of queries in Bulk 1.0 API
- Overview of queries in Bulk 2.0 API
Authentication
First, load the {salesforcer} package and login. There are two ways to authenticate: 1) OAuth 2.0 (SSO) and 2) Basic Username-Password. It is recommended to use OAuth 2.0 so that passwords do not have to be embedded in scripts or environment variables. By default, OAuth 2.0 stores the user’s credentials in a locally cached file entitled “.httr-oauth-salesforcer” in the current working directory and will be refreshed automatically when the session expires.
library(dplyr, warn.conflicts = FALSE)
library(salesforcer)
sf_auth()
Default query behavior
The default API for the sf_query()
function is the REST
API because it is both fast and flexible. Every effort has been made so
that the format of the results from the REST and SOAP APIs is exactly
the same. The only difference will be speed. The REST API uses JSON,
which can generally be processed more quickly than XML used in the SOAP
API.
soql <- "SELECT Id,
FirstName,
LastName
FROM Contact
LIMIT 10"
queried_records <- sf_query(soql) # REST API is the default api_type
queried_records
#> # A tibble: 10 × 3
#> Id FirstName LastName
#> <chr> <chr> <chr>
#> 1 003Kg000002AaGiIAK Test Contact-Create-3
#> 2 003Kg000002AaGjIAK Test Contact-Create-4
#> 3 003Kg000002Ac5rIAC Test Contact-Create-1
#> 4 003Kg000002Ac5sIAC Test Contact-Create-2
#> 5 003Kg000002AaJ7IAK Test Contact-Create-1
#> # ℹ 5 more rows
queried_records <- sf_query(soql, api_type = "SOAP")
queried_records
#> # A tibble: 10 × 3
#> Id FirstName LastName
#> <chr> <chr> <chr>
#> 1 003Kg000002AaGiIAK Test Contact-Create-3
#> 2 003Kg000002AaGjIAK Test Contact-Create-4
#> 3 003Kg000002Ac5rIAC Test Contact-Create-1
#> 4 003Kg000002Ac5sIAC Test Contact-Create-2
#> 5 003Kg000002AaJ7IAK Test Contact-Create-1
#> # ℹ 5 more rows
REST vs. SOAP API query performance test
Below is a small example to roughly demonstrate the magnitude of the performance difference between the REST and SOAP APIs when querying 1,000 records.
Setup performance test
# create a new account
# (if replicating, you may or may not have an external id field in your Org)
prefix <- paste0("APerfTest-", as.integer(runif(1,1,99999)))
new_account <- sf_create(
tibble(
Name = "Test Account For Performance Test",
My_External_Id__c = prefix,
Description = paste0("This is a test account with 1,000 records for ",
"testing the performance differences between the ",
"SOAP and REST APIs.")
),
object_name = "Account"
)
# create and associate a thousand new contacts with that account
# (again, you may or may not have an external id field in your Org)
n <- 1000
prefix <- paste0("CPerfTest-", as.integer(runif(1,1,99999)), "-")
new_contacts <- tibble(FirstName = rep("Test", n),
LastName = paste0("Query-Vignette", 1:n),
test_number__c = 999.9,
AccountId = rep(new_account$id, n),
My_External_Id__c=paste0(prefix, 1:n))
new_contacts_res <- sf_create(new_contacts, "Contact", api_type = "Bulk 2.0")
Performance test
qry <- function(api_type){
sf_query(
sprintf("SELECT Id, Name, Owner.Id,
(SELECT Id, LastName, Owner.Id FROM Contacts)
FROM Account
WHERE Id = '%s'",
new_account$id),
api_type = api_type
)
}
res <- microbenchmark::microbenchmark(
qry("REST"),
qry("SOAP"),
times = 5,
unit = "s"
)
#> Warning in microbenchmark::microbenchmark(qry("REST"), qry("SOAP"), times = 5,
#> : less accurate nanosecond times to avoid potential integer overflows
res
#> Unit: seconds
#> expr min lq mean median uq max neval
#> qry("REST") 0.1294648 0.1342023 0.1411924 0.1459792 0.1479011 0.1484147 5
#> qry("SOAP") 0.4706413 0.4835107 0.4819378 0.4845979 0.4851070 0.4858323 5
suppressWarnings(suppressMessages(
ggplot2::autoplot(res) +
ggplot2::scale_y_continuous(name="Time [seconds]", n.breaks=6)
))
As seen in the limited test above, the REST API can be anywhere from
4-6x faster than the SOAP API for a query on 1,000
contact records associated with a single Account. Breaking up the number
of records returned into smaller batches by setting
QueryOptions = list(batchSize = 200)
typically does not
affect this result very much but it also depends on the number of fields
in the query. For the REST API the default is 2,000 records per batch
with a minimum of 200 and maximum of 2,000. For the SOAP API the default
is 500 records per batch. For both APIs it is important to note that
there is no guarantee that the requested batch size is the actual batch
size. Changes are made as necessary to maximize performance. For
example, the SOAP API states “batch size will be no more than 200 if the
SOQL statement selects two or more custom fields of type long text”. The
REST API mentions that the limit imposed by Salesforce’s app servers is
around 20,000 characters which can cause batches to be smaller. In
short, it’s generally okay to use the default batch sizes since
Salesforce may optimize over your specified batch size anyways.
When to use the Bulk APIs for queries
A general rule of thumb for using the Bulk APIs (Bulk 1.0 and Bulk 2.0) for queries is anytime you need to retrieve more than 10,000 records. The main reasons to not use the Bulk APIs are twofold. First, they do not support complex relationship queries or aggregate queries. If you need to write a nested relationship or aggregate query involving a large number of records you may be tempted to use the REST API. However, it is recommended to perform two or more separate bulk queries that retrieve the records you need and then join or aggregate the results in R.
# nested relationship query
# (supposed to return the id and first name of all contacts on each account)
try(
sf_query(
"SELECT Id, Name,
(SELECT Id, FirstName FROM Contacts)
FROM Account",
api_type = "Bulk 2.0"
)
)
#> Request failed [400]. Retrying in 1 seconds...
#> Request failed [400]. Retrying in 1 seconds...
#> Error : API_ERROR: Aggregate Relationships not supported in Bulk V2 Query with CSV content type
# aggregate query
# (supposed to return the count of contacts per account)
try(
sf_query(
"SELECT Account.Id, Count(Name) contacts_n
FROM Contact
GROUP BY Account.Id",
api_type = "Bulk 2.0"
)
)
#> Request failed [400]. Retrying in 1 seconds...
#> Request failed [400]. Retrying in 3.9 seconds...
#> Error : API_ERROR: Aggregate Relationships not supported in Bulk Query
The two queries above were trying to pull all the contacts for each account and then get a count of how many contacts there are per account. If you have a lot of records, using the REST API to return these results may not be feasible. Even though the Bulk APIs cannot handle the same query, they can pull down massive amounts of data quickly. In this case you can pull down all of the Contact records and all of the Account records and then perform the calculation using dplyr, like so:
contacts <- sf_query("SELECT Id, FirstName, Account.Id
FROM Contact",
api_type = "Bulk 2.0")
accounts <- sf_query("SELECT Id, Name
FROM Account",
api_type = "Bulk 2.0")
nested_query_recs <- accounts %>%
left_join(contacts %>%
rename(`Contact.Id` = Id,
`Contact.FirstName` = FirstName),
by = c("Id" = "Account.Id"))
nested_query_recs
#> # A tibble: 518 × 4
#> Id Name Contact.Id Contact.FirstName
#> <chr> <chr> <chr> <chr>
#> 1 0013s00000zFdugAAC KEEP Test Account With Child … 0033s0000… KEEP
#> 2 0013s00000zFdugAAC KEEP Test Account With Child … 0033s0000… KEEP
#> 3 0013s00000zFdugAAC KEEP Test Account With Child … 0033s0000… KEEP
#> 4 0013s00000zFdugAAC KEEP Test Account With Child … 0033s0000… KEEP
#> 5 0013s00000zFdugAAC KEEP Test Account With Child … 0033s0000… KEEP
#> # ℹ 513 more rows
aggregate_query_recs <- nested_query_recs %>%
group_by(Id) %>%
summarize(.groups = 'drop',
contacts_n = sum(!is.na(Contact.Id)))
aggregate_query_recs
#> # A tibble: 17 × 2
#> Id contacts_n
#> <chr> <int>
#> 1 0013s00000zFdugAAC 300
#> 2 0013s00000zFgA6AAK 0
#> 3 0013s000014jF2HAAU 0
#> 4 0013s000014jF2vAAE 0
#> 5 0013s000014jFj6AAE 0
#> # ℹ 12 more rows
The second reason to not use the Bulk APIs is that there is a performance overhead associated with every bulk (asynchronous) job that involves checking the status of the job until it succeeds or fails before retrieving the results.
The example below is provided so that you can take this code as an example to run your own performance test of queries that return 10K, 100K, 1M+ records to see where the Bulk APIs outperform the REST API.
qry_compare <- function(api_type){
soql <- sprintf("SELECT Id, LastName, Account.Id, Account.Name, Owner.Id
FROM Contact
WHERE Account.Id = '%s'",
new_account$id)
sf_query(soql, api_type = api_type)
}
res <- microbenchmark::microbenchmark(
qry_compare("REST"),
qry_compare("Bulk 1.0"),
qry_compare("Bulk 2.0"),
times = 5,
unit = "s"
)
Note that the Bulk 1.0 API requires users to specify the target object along with their submitted SOQL. This is because it is needed when creating the bulk job that will manage and execute the query.
queried_records <- sf_query(soql, api_type = "Bulk 1.0")
#> Guessed 'Contact' as the object_name from supplied SOQL.
#> Please set `object_name` explicitly if this is incorrect because it is required by the Bulk APIs.
As you can see above the {salesforcer} package will try to infer the object in the query if not explicitly provided. If it does not guess correctly, then please specify.
Cleanup after performance tests
By keeping track of the account ids used in our tests, it is fairly easy to find and delete these test records from our Org to save space.
# cleanup performance test Contact records ...
contacts_to_delete <- sf_query(
sprintf("SELECT Id
FROM Contact
WHERE Account.Id = '%s'",
new_account$id)
)
sf_delete(contacts_to_delete$Id, "Contact", api_type="Bulk 2.0")
#> # A tibble: 99 × 4
#> Id sf__Id sf__Created sf__Error
#> <chr> <chr> <lgl> <lgl>
#> 1 003Kg000002AcmjIAC 003Kg000002AcmjIAC FALSE NA
#> 2 003Kg000002AcmkIAC 003Kg000002AcmkIAC FALSE NA
#> 3 003Kg000002AcmlIAC 003Kg000002AcmlIAC FALSE NA
#> 4 003Kg000002AcmmIAC 003Kg000002AcmmIAC FALSE NA
#> 5 003Kg000002AcmnIAC 003Kg000002AcmnIAC FALSE NA
#> # ℹ 94 more rows
# ... and finally delete the account
sf_delete(new_account$id)
#> # A tibble: 1 × 3
#> id success errors
#> <chr> <lgl> <list>
#> 1 001Kg0000034qRLIAY TRUE <list [0]>
Relationship queries
Salesforce supports retrieving fields from related objects when querying another object. This is similar to performing a JOIN in SQL, but without having to specify the join keys because Salesforce already knows the relationship between the two objects. There are two types of relationship queries (1. child-to-parent lookups and 2. parent-to-child nested queries) detailed in the sections below.
child-to-parent “lookup” queries
The first type of relationship query and the most common is child to parent. For example, the Contact object (child) to their parent, the Account object. In order to pull down parent object fields with your child record query, you just need to prefix any fields from the related object by concatenating the name of the object with the field name separated by a period. In the example below we are retrieving all Contact object records that have a relationship to an Account.
# child-to-parent relationship (e.g. Account.Name from Contact record)
sf_query(
"SELECT Id, FirstName, Account.Name
FROM Contact
WHERE Account.Id != null"
)
#> # A tibble: 414 × 3
#> Id FirstName Account.Name
#> <chr> <chr> <chr>
#> 1 0033s000012NkzwAAC KEEP KEEP Test Account With Child Records
#> 2 0033s000012NkzxAAC KEEP KEEP Test Account With Child Records
#> 3 0033s000012NkzyAAC KEEP KEEP Test Account With Child Records
#> 4 0033s000012NkzzAAC KEEP KEEP Test Account With Child Records
#> 5 0033s000012Nl00AAC KEEP KEEP Test Account With Child Records
#> # ℹ 409 more rows
Sometimes you may notice that the requested relationship fields do
not appear in the query results. This is because the SOAP and REST APIs
do not return any related object information if it does not exist on the
record and there is no reliable way to extract and rebuild the empty
columns based on the query string. In the example below, if there were
Account information an additional column titled
"Account.Name"
would appear in the results.
# child-to-parent relationship (e.g. Account.Name from Contact record)
sf_query(
"SELECT Id, FirstName, Account.Name
FROM Contact
WHERE Account.Id = null"
)
#> # A tibble: 516 × 2
#> Id FirstName
#> <chr> <chr>
#> 1 003Kg000002AaGiIAK Test
#> 2 003Kg000002AaGjIAK Test
#> 3 003Kg000002Ac5rIAC Test
#> 4 003Kg000002Ac5sIAC Test
#> 5 003Kg000002AaJ7IAK Test
#> # ℹ 511 more rows
Note, that the Bulk 1.0 and Bulk 2.0 APIs will return
"Account.Name"
as a column of all NA
values
for this query because they return results differently.
Finally, one aspect to note is that the Bulk 2.0 API does not support child-to-parent-grandparent relationships as seen in the example below:
try(
sf_query("SELECT Id, FirstName, Account.Owner.Id
FROM Contact",
api_type = "Bulk 2.0")
)
#> # A tibble: 930 × 3
#> Id FirstName Account.Owner.Id
#> <chr> <chr> <chr>
#> 1 0033s000012Nd60AAC Jenny NA
#> 2 0033s000012Nd65AAC Jenny NA
#> 3 0033s000012Nd6FAAS Jenny NA
#> 4 0033s000012Nd6UAAS Jenny NA
#> 5 0033s000012NdARAA0 Jenny NA
#> # ℹ 925 more rows
parent-to-child “nested” queries
Instead of “looking up” a related field, users can write queries that retrieve the individual records related to a parent. For example, if you would like all of the Accounts and their Contacts you can write the query like so:
sf_query(
"SELECT Id, Name,
(SELECT Id, FirstName FROM Contacts)
FROM Account"
)
#> # A tibble: 419 × 4
#> Id Name Contact.FirstName Contact.Id
#> <chr> <chr> <chr> <chr>
#> 1 0013s00000zFgA6AAK KEEP Test Account With Child … NA NA
#> 2 0013s00000zFdugAAC KEEP Test Account With Child … KEEP 0033s0000…
#> 3 0013s00000zFdugAAC KEEP Test Account With Child … KEEP 0033s0000…
#> 4 0013s00000zFdugAAC KEEP Test Account With Child … KEEP 0033s0000…
#> 5 0013s00000zFdugAAC KEEP Test Account With Child … KEEP 0033s0000…
#> # ℹ 414 more rows
At first glance this query may appear the same as a lookup query on the Contact object that includes the account id and name. However, the small difference is that every Account is included, regardless of whether or not they have a Contact. This can be helpful when you want to ensure a query contains all of the parent records and their child records, if they exist. Also, note that the plural object name is used inside the nested query (“Contacts” instead of “Contact”).
Finally, a parent-to-child nested query can also contain a child-to-parent lookup relationship within it. Below is an example where the Owner Id on the Contact is included so you can know who is responsible for the Contacts under each Account.
sf_query(
"SELECT Name, Owner.Id,
(SELECT Id, FirstName, Owner.Id FROM Contacts)
FROM Account"
)
#> # A tibble: 419 × 5
#> Name Contact.FirstName Contact.Id Contact.Owner.Id Owner.Id
#> <chr> <chr> <chr> <chr> <chr>
#> 1 KEEP Test Account With… NA NA NA 0056A00…
#> 2 KEEP Test Account With… KEEP 0033s0000… 0056A000000MPRj… 0056A00…
#> 3 KEEP Test Account With… KEEP 0033s0000… 0056A000000MPRj… 0056A00…
#> 4 KEEP Test Account With… KEEP 0033s0000… 0056A000000MPRj… 0056A00…
#> 5 KEEP Test Account With… KEEP 0033s0000… 0056A000000MPRj… 0056A00…
#> # ℹ 414 more rows
Troubleshooting
If you are having an issue with a query please submit in the {salesforcer} GitHub repository at: https://github.com/StevenMMortimer/salesforcer/issues. As a maintainer, queries are tough to debug because every Salesforce Org is unique. Custom objects or relationships created in your Salesforce Org may be different or even impossible to test in another Org. When filing your issue please make an attempt to understand the query and debug a little bit on your own. Here are a few suggestions:
-
Slightly modify your function call to
sf_query()
to observe the results. Here are a few prompting questions that may assist you:What do you see when you set
verbose=TRUE
argument?What happens if you change the
control
argument, specifically the batch size?What happens if you try using a different API (e.g. “SOAP” vs “REST” or “Bulk 1.0” vs “Bulk 2.0”)?
What happens if you change your query slightly?
Do you need a parent-to-child nested relationship query or will a child-to-parent lookup suffice?
Check out Salesforce’s Workbench tool to see how it constructs specific queries that you are debugging. The tool is available at
https://workbench.developerforce.com
and requires a Salesforce login (the same credentials as you normally would use).Double check Salesforce’s SOQL reference guide to see whether your query is supported or limited in some way.
Review query unit tests at: https://github.com/StevenMMortimer/salesforcer/blob/main/tests/testthat/test-query.R. These unit tests were written to cover a variety of use cases and to track any changes made between newly released versions of the Salesforce API (typically 4 each year). These tests are an excellent source of examples that may be helpful in troubleshooting your own query.
-
Roll up your sleeves and dive into the source code for the {salesforcer} package. The main scripts to review are: