QATCH Generate
TestGenerator
The interface to connect the MultipleDatabase with the SQL generators. Use this class to generate queries and questions from the databases.
Attributes:
Name | Type | Description |
---|---|---|
databases |
MultipleDatabases
|
The MultipleDatabases object representing the database connections. |
Source code in qatch/test_generator.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
|
generate(generators=None, db_names=None, seed=2023)
Generate test queries and questions for specified generators and databases.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
generators |
list[str] | str | None
|
Optional. A list of generator names to be used. Default is to use all available generators ['select', 'orderby', 'distinct', 'where', 'groupby', 'having', 'simpleAgg', 'nullCount']. |
None
|
db_names |
str | list[str] | None
|
Optional. The name or list of names of databases to generate tests for. Default is to use all available databases. |
None
|
seed |
int
|
Optional. Seed value for randomization. Default is 2023. |
2023
|
Returns:
Type | Description |
---|---|
DataFrame
|
pd.DataFrame: A DataFrame containing generated test queries, questions, and related information. |
Examples:
Given a MultipleDatabases object "database", with three databases 'sakila', 'world', and 'employees'
>>> generator = TestGenerator(databases)
>>> tests_df = generator.generate(generators=['select', 'orderby'], db_names=['sakila', 'world'])
generate tests only for select and orderby generators, and only for sakila and world databases
Source code in qatch/test_generator.py
DistinctGenerator
Bases: AbstractSqlGenerator
A class for generating DISTINCT SQL queries and corresponding questions based on categorical columns of a database table.
Attributes:
Name | Type | Description |
---|---|---|
database |
SingleDatabase
|
The SingleDatabase object representing the database to generate queries from. |
sql_generated |
dict
|
A dictionary containing generated SQL tags, queries, and questions. Format: {"sql_tags": list[str], "queries": list[str], "questions": list[str]} |
Source code in qatch/sql_generator/distinct_generator.py
sql_generate(table_name)
Generates DISTINCT SQL queries and corresponding questions based on categorical columns of a table. Generates two distinct tags: DISTINCT-SINGLE and DISTINCT-MULT only for the categorical columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table_name |
str
|
The name of the table in the database. |
required |
Returns:
Name | Type | Description |
---|---|---|
dict |
dict[str, list]
|
A dictionary containing generated SQL tags, queries, and questions. Format: {"sql_tags": List[str], "queries": List[str], "questions": List[str]} |
Examples:
Given a MultipleDatabases object "database" with a table "table_name" with columns "colors" and "names"
>>> generator = DistinctGenerator(database)
>>> generator._distinct_single_col("table_name", ["colors"])
>>> generator.sql_generated
{"sql_tags": ["DISTINCT-SINGLE"],
"queries": ["SELECT DISTINCT "colors" FROM "table_name""],
"questions": ["Show the different "colors" in the table table_name"]}
>>> generator_distinct_mult_col("table_name", ["colors", "names"])
>>> generator.sql_generated
{"sql_tags": ["DISTINCT-MULT"],
"queries": ["SELECT DISTINCT "colors", "names" FROM "table_name""],
"questions": ["Show the different "colors", "names" in the table table_name"]}
Source code in qatch/sql_generator/distinct_generator.py
GroupByGenerator
Bases: AbstractSqlGenerator
A class for generating SQL queries and corresponding questions based on group-by operations performed on categorical and numerical columns of a database table.
Attributes:
Name | Type | Description |
---|---|---|
database |
SingleDatabase
|
The SingleDatabase object representing the database to generate queries from. |
sql_generated |
dict
|
A dictionary containing generated SQL tags, queries, and questions. Format: {"sql_tags": list[str], "queries": list[str], "questions": list[str]} |
Source code in qatch/sql_generator/groupby_generator.py
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 |
|
sql_generate(table_name)
Generates Group By queries and corresponding questions for both categorical and numerical columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table_name |
str
|
The name of the table in the database. |
required |
Returns:
Name | Type | Description |
---|---|---|
dict |
dict[str, list]
|
A dictionary containing generated SQL tags, queries, and questions. Format: {"sql_tags": list[str], "queries": list[str], "questions": list[str]} |
Examples:
Given a MultipleDatabases object "database" with a table "table_name" with columns "colors" and "numbers"
>>> generator = GroupByGenerator(database)
>>> generator._build_group_by_no_agg("table_name", ["colors"])
>>> generator.sql_generated
{
"sql_tags": ["GROUPBY-NO-AGGR"],
"queries": ["SELECT `colors` FROM `table_name` GROUP BY `colors`"],
"questions": ["Show all "colors" in the table "table_name" for each "colors""]
}
>>> generator._build_group_by_with_count("table_name", ["colors"])
>>> generator.sql_generated
{
"sql_tags": ["GROUPBY-COUNT"],
"queries": ["SELECT `colors`, COUNT(*) FROM `table_name` GROUP BY `colors`"],
"questions": ["For each "colors", count the number of rows in table "table_name""]
}
>>> generator._build_group_by_with_agg("table_name")
>>> generator.sql_generated
{
"sql_tags": ["GROUPBY-AGG-MIN", "GROUPBY-AGG-MAX", "GROUPBY-AGG-AVG", "GROUPBY-AGG-SUM"],
"queries": [
"SELECT `colors`, MIN(`numbers`) FROM `table_name` GROUP BY `colors`",
"SELECT `colors`, MAX(`numbers`) FROM `table_name` GROUP BY `colors`",
"SELECT `colors`, AVG(`numbers`) FROM `table_name` GROUP BY `colors`",
"SELECT `colors`, SUM(`numbers`) FROM `table_name` GROUP BY `colors`"
],
"questions": [
"For each `colors`, find the min of `numbers` in table `table_name`",
"For each `colors`, find the max of `numbers` in table `table_name`",
"For each `colors`, find the avg of `numbers` in table `table_name`",
"For each `colors`, find the sum of `numbers` in table `table_name`"
]
}
Source code in qatch/sql_generator/groupby_generator.py
HavingGenerator
Bases: AbstractSqlGenerator
A class for generating HAVING SQL queries and corresponding questions based on a database table.
Attributes:
Name | Type | Description |
---|---|---|
database |
SingleDatabase
|
The SingleDatabase object representing the database to generate queries from. |
sql_generated |
dict
|
A dictionary containing generated SQL tags, queries, and questions. Format: {"sql_tags": list[str], "queries": list[str], "questions": list[str]} |
Source code in qatch/sql_generator/having_generator.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 |
|
sql_generate(table_name)
Generate HAVING SQL queries and corresponding questions for categorical and numerical columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table_name |
str
|
The name of the table in the database. |
required |
Returns:
Name | Type | Description |
---|---|---|
dict |
dict[str, list]
|
A dictionary containing generated SQL tags, queries, and questions. Format: {"sql_tags": list[str], "queries": list[str], "questions": list[str]} |
Examples:
Given a MultipleDatabases object "database" with a table "table_name" with columns "colors" and "numbers" and (i) the average number of rows for each category in "colors" is 2 (ii) the mean of the average numbers for each category is 5 (iii) the average sum of numbers for each category is 10:
>>> generator = HavingGenerator(database)
>>> generator._build_having_count(table_name, ["colors"], df)
>>> generator.sql_generated
{
"sql_tags": ["HAVING-COUNT-GR", "HAVING-COUNT-LS", "HAVING-COUNT-EQ"],
"queries": [
'SELECT `colors` FROM `table_name` GROUP BY `colors` HAVING count(*) >= 2',
'SELECT `colors` FROM `table_name` GROUP BY `colors` HAVING count(*) <= 2',
'SELECT `colors` FROM `table_name` GROUP BY `colors` HAVING count(*) = 2'
],
"questions": [
'Find all the `colors` that have at least 2 records in table `table_name`',
'Find all the `colors` that have at most 2 records in table `table_name`',
'Find all the `colors` that have exactly 2 records in table `table_name`'
]
}
>>> generator._build_having_agg(table_name, ["colors"], ["numbers"], df)
>>> generator.sql_generated
{
"sql_tags": ["HAVING-AGG-AVG-GR", "HAVING-AGG-AVG-LS", "HAVING-AGG-SUM-GR", "HAVING-AGG-SUM-LS"],
"queries": [
'SELECT `colors` FROM `table_name` GROUP BY `colors` HAVING AVG(`numbers`) >= 5.0',
'SELECT `colors` FROM `table_name` GROUP BY `colors` HAVING AVG(`numbers`) <= 5.0',
'SELECT `colors` FROM `table_name` GROUP BY `colors` HAVING SUM(`numbers`) >= 10.0',
'SELECT `colors` FROM `table_name` GROUP BY `colors` HAVING SUM(`numbers`) <= 10.0'
],
"questions": [
'List the `colors` which average `numbers` is at least 5.0 in table `table_name`',
'List the `colors` which average `numbers` is at most 5.0 in table `table_name`',
'List the `colors` which summation of `numbers` is at least 5.0 in table `table_name`',
'List the `colors` which summation of `numbers` is at most 5.0 in table `table_name`'
]
}
Source code in qatch/sql_generator/having_generator.py
JoinGenerator
Bases: AbstractSqlGenerator
A class for generating SQL queries, and questions based on input tables in a database.
Attributes:
Name | Type | Description |
---|---|---|
database |
SingleDatabase
|
The SingleDatabase object representing the database to generate queries from. |
sql_generated |
dict
|
A dictionary containing generated SQL tags, queries, and questions. Format: {"sql_tags": list[str], "queries": list[str], "questions": list[str]} |
Source code in qatch/sql_generator/join_generator.py
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 |
|
sql_generate(table_name)
This method generates SQL queries by performing JOIN operations on the specified table with other tables that have common columns in the database. It first empties any previously generated SQL queries, then it determines the tables to join based on common columns, and finally it generates the actual SQL queries.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table_name |
str
|
The name of the input table. |
required |
Returns:
Type | Description |
---|---|
dict[str, list]
|
dict[str, list]: A dictionary where the keys are SQL tags and the values are lists of generated SQL queries. |
Examples:
Assuming a MultipleDatabases object "database" with two tables, 'orders' and 'customers', 'orders' table has columns
['order_id', 'product', 'customer_id'] and 'customers' table has columns ['customer_id', 'name', 'address'].
>>> generator = JoinGenerator(database)
>>> generator._generate_join_project_all("orders")
>>> generator.sql_generated
>>> {'sql_tags': ['JOIN-PROJECT-ALL'],
>>> 'queries': ['SELECT * FROM "orders" AS T1 JOIN customers AS T2 ON T1.customer_id=T2.customer_id'],
>>> 'questions': ['Join all the records from table "orders" with table "customers" on "customer_id"']}
>>> generator._generate_join_cat_columns("orders")
>>> generator.sql_generated
>>> {'sql_tags': ['JOIN-PROJECT-CAT'],
>>> 'queries': ["SELECT T1.product, T2.name FROM orders AS T1 JOIN customers AS T2 ON T1.order_id=T2.order_id"],
>>> 'questions': ['List all the "product" and "name" from the table "orders" and the table "customers" where "order_id" is the same']}
Note
The method makes use of other internal methods to firstly get tables that can be joined with the given table and generate SQL queries on this basis.
Source code in qatch/sql_generator/join_generator.py
NullGenerator
Bases: AbstractSqlGenerator
A class for generating NULL SQL queries and corresponding questions based on a database table.
Attributes:
Name | Type | Description |
---|---|---|
database |
SingleDatabase
|
The SingleDatabase object representing the database to generate queries from. |
sql_generated |
dict
|
A dictionary containing generated SQL tags, queries, and questions. Format: {"sql_tags": list[str], "queries": list[str], "questions": list[str]} |
Source code in qatch/sql_generator/null_generator.py
sql_generate(table_name)
Generates NULL queries and corresponding questions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table_name |
str
|
The name of the table in the database. |
required |
Returns:
Name | Type | Description |
---|---|---|
dict |
dict[str, list]
|
A dictionary containing generated SQL tags, queries, and questions. Format: {"sql_tags": list[str], "queries": list[str], "questions": list[str]} |
Examples:
Given a MultipleDatabases object "database" with a table "table_name" with columns "colors" with 10 NULL values:
>>> generator = NullGenerator(database)
>>> generator._build_null_count('table_name', ['colors'])
>>> generator.sql_generated
{
"sql_tags": ['NULL-COUNT'],
"queries": ['SELECT COUNT(*) FROM "table_name" WHERE "colors" IS NULL'],
"questions": ['Count the rows where the values of "colors" are missing in table "table_name"']
}
>>> generator._build_not_null_count('table_name', ['colors'])
>>> generator.sql_generated
{
"sql_tags": ['NOT-NULL-COUNT'],
"queries": ['SELECT COUNT(*) FROM "table_name" WHERE "colors" IS NOT NULL'],
"questions": ['Count the rows where the values of "colors" are not missing in table "table_name"']
}
Source code in qatch/sql_generator/null_generator.py
OrderByGenerator
Bases: AbstractSqlGenerator
A class for generating ORDER BY SQL queries and corresponding questions based on a database table.
Attributes:
Name | Type | Description |
---|---|---|
database |
SingleDatabase
|
The SingleDatabase object representing the database to generate queries from. |
sql_generated |
dict
|
A dictionary containing generated SQL tags, queries, and questions. Format: {"sql_tags": list[str], "queries": list[str], "questions": list[str]} |
Source code in qatch/sql_generator/orderby_generator.py
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 |
|
sql_generate(table_name)
Generate ORDER BY queries and corresponding questions based on the specified table.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table_name |
str
|
The name of the table in the database. |
required |
Returns:
Name | Type | Description |
---|---|---|
dict |
dict[str, list]
|
A dictionary containing generated SQL tags, queries, and questions. Format: {"sql_tags": list[str], "queries": list[str], "questions": list[str]} |
Examples:
Given a MultipleDatabases object "database" with a table "table_name" with columns "colors" and "numbers"
>>> generator = OrderByGenerator(database)
>>> generator._generate_order_asc("table_name", ["colors", "numbers"])
>>> generator.sql_generated
{
"sql_tags": ["ORDERBY-SINGLE", "ORDERBY-SINGLE"],
"queries": [
'SELECT * FROM "table_name" ORDER BY "colors" ASC',
'SELECT * FROM "table_name" ORDER BY "numbers" ASC'
],
"questions": [
'Show all data ordered by "colors" in ascending order for the table "table_name"',
'Show all data ordered by "numbers" in ascending order for the table "table_name"'
]
}
>>> generator._generate_order_desc("table_name", ["colors", "numbers"])
>>> generator.sql_generated
{
"sql_tags": ["ORDERBY-SINGLE", "ORDERBY-SINGLE"],
"queries": [
'SELECT * FROM "table_name" ORDER BY "colors" DESC',
'SELECT * FROM "table_name" ORDER BY "numbers" DESC'
],
"questions": [
'Show all data ordered by "colors" in descending order for the table "table_name"',
'Show all data ordered by "numbers" in descending order for the table "table_name"'
]
}
>>> generator._generate_order_asc_project("table_name", ["colors", "numbers"])
>>> generator.sql_generated
{
"sql_tags": ["ORDERBY-PROJECT", "ORDERBY-PROJECT"],
"queries": [
'SELECT "colors" FROM "table_name" ORDER BY "colors" ASC',
'SELECT "numbers" FROM "table_name" ORDER BY "numbers" ASC'
],
"questions": [
'Project the "colors" ordered in ascending order for the table "table_name"',
'Project the "numbers" ordered in ascending order for the table "table_name"'
]
}
>>> generator._generate_order_desc_project("table_name", ["colors", "numbers"])
>>> generator.sql_generated
{
"sql_tags": ["ORDERBY-PROJECT", "ORDERBY-PROJECT"],
"queries": [
'SELECT "colors" FROM "table_name" ORDER BY "colors" DESC',
'SELECT "numbers" FROM "table_name" ORDER BY "numbers" DESC'
],
"questions": [
'Project the "colors" ordered in descending order for the table "table_name"',
'Project the "numbers" ordered in descending order for the table "table_name"'
]
}
Source code in qatch/sql_generator/orderby_generator.py
SelectGenerator
Bases: AbstractSqlGenerator
A class for generating SELECT SQL queries and corresponding questions based on a database table.
Attributes:
Name | Type | Description |
---|---|---|
database |
SingleDatabase
|
The SingleDatabase object representing the database to generate queries from. |
sql_generated |
dict
|
A dictionary containing generated SQL tags, queries, and questions. Format: {"sql_tags": list[str], "queries": list[str], "questions": list[str]} |
Source code in qatch/sql_generator/select_generator.py
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 |
|
sql_generate(table_name)
Generate SQL tags, queries, and questions based on the specified table.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table_name |
str
|
The name of the table in the database. |
required |
Returns:
Name | Type | Description |
---|---|---|
dict |
dict[str, list]
|
A dictionary containing generated SQL tags, queries, and questions. Format: {"sql_tags": list[str], "queries": list[str], "questions": list[str]} |
Examples:
Given a MultipleDatabases object "database" with a table "table_name" with columns "name", "colors" and "numbers":
>>> generator = SelectGenerator(database)
>>> generator._select_all_table("table_name")
>>> generator.sql_generated
{
"sql_tags": ["SELECT-ALL"],
"queries": ['SELECT * FROM "table_name"'],
"questions": ["Show all the rows in the table table_name"]
}
>>> generator._select_add_col("table_name")
>>> generator.sql_generated
{
"sql_tags": ["SELECT-ADD-COL", "SELECT-ADD-COL", "SELECT-ADD-COL"],
"queries": ['SELECT "colors" FROM "table_name"',
'SELECT "colors", "numbers" FROM "table_name"'
'SELECT "colors", "numbers", "name" FROM "table_name"'],
"questions": ["Show all colors in the table table_name",
"Show all colors, numbers in the table table_name"
"Show all colors, numbers, name in the table table_name"]
}
>>> generator._select_random_col("table_name")
>>> generator.sql_generated
{
"sql_tags": ["SELECT-RANDOM-COL", "SELECT-RANDOM-COL"],
"queries": ['SELECT "colors" FROM "table_name"',
'SELECT "name", "numbers" FROM "table_name"',
'SELECT "numbers", "colors", "name" FROM "table_name"'],
"questions": ["Show all colors in the table table_name",
"Show all name, numbers in the table table_name",
"Show all numbers, colors, name in the table table_name"]
}
Source code in qatch/sql_generator/select_generator.py
SimpleAggGenerator
Bases: AbstractSqlGenerator
A class for generating Simple Aggregation queries and corresponding questions based on a database table.
Attributes:
Name | Type | Description |
---|---|---|
database |
SingleDatabase
|
The SingleDatabase object representing the database to generate queries from. |
sql_generated |
dict
|
A dictionary containing generated SQL tags, queries, and questions. Format: {"sql_tags": List[str], "queries": List[str], "questions": List[str]} |
Source code in qatch/sql_generator/simple_agg_generator.py
sql_generate(table_name)
Generates Simple Aggregation SQL queries and corresponding questions for the specified table.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table_name |
str
|
The name of the table in the database. |
required |
Returns:
Name | Type | Description |
---|---|---|
dict |
dict[str, list]
|
A dictionary containing generated SQL tags, queries, and questions. Format: {"sql_tags": List[str], "queries": List[str], "questions": List[str]} |
Examples:
Given a MultipleDatabases object "database" with a table "table_name" with columns "colors" and "numbers"
>>> generator = SimpleAggGenerator(database)
>>> generator._build_count_cat("table_name", ["colors"])
>>> generator.sql_generated
{
"sql_tags": ["SIMPLE-AGG-COUNT", "SIMPLE-AGG-COUNT-DISTINCT"],
"queries": [
'SELECT COUNT(*) FROM "table_name"',
'SELECT COUNT(DISTINCT"colors") FROM "table_name"'
],
"questions": [
'Count the records in table "table_name"?',
'How many different "colors" are in table "table_name"?'
]
}
>>> generator._build_count_agg("table_name", ["numbers"])
>>> generator.sql_generated
{
"sql_tags": ["SIMPLE-AGG-MAX", "SIMPLE-AGG-MIN", "SIMPLE-AGG-AVG"],
"queries": [
'SELECT MAX("numbers") FROM "table_name"',
'SELECT MIN("numbers") FROM "table_name"',
'SELECT AVG("numbers") FROM "table_name"'
],
"questions": [
'Find the maximum "numbers" for the table "table_name"',
'Find the minimum "numbers" for the table "table_name"',
'Find the average "numbers" for the table "table_name"'
]
}
Source code in qatch/sql_generator/simple_agg_generator.py
WhereGenerator
Bases: AbstractSqlGenerator
A class for generating WHERE SQL queries and corresponding questions based on a database table.
Attributes:
Name | Type | Description |
---|---|---|
database |
SingleDatabase
|
The SingleDatabase object representing the database to generate queries from. |
sql_generated |
dict
|
A dictionary containing generated SQL tags, queries, questions, and results. Format: {"sql_tags": list[str], "queries": list[str], "questions": list[str], "results": list[pd.DataFrame]} |
Source code in qatch/sql_generator/where_generator.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 |
|
sql_generate(table_name)
Generates WHERE SQL queries and corresponding questions for both categorical and numerical columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table_name |
str
|
The name of the table in the database. |
required |
Returns:
Name | Type | Description |
---|---|---|
dict |
dict[str, list]
|
A dictionary containing generated SQL tags, queries, questions, and results. Format: {"sql_tags": list[str], "queries": list[str], "questions": list[str]} |
Examples:
Given a MultipleDatabases object "database" with a table "table_name" with columns "colors" and "numbers"
>>> sample_df = pd.DataFrame({"colors": ["green", "blue", "blue", "blue", "blue"], "numbers": [1, 2, 3, 4, 5]})
>>> generator = WhereGenerator(database)
>>> generator._generate_where_categorical("table_name", ["colors"], sample_df)
>>> generator.sql_generated
{
"sql_tags": ["WHERE-CAT-MOST-FREQUENT", "WHERE-CAT-LEAST-FREQUENT",
'WHERE-CAT-MOST-FREQUENT', 'WHERE-CAT-LEAST-FREQUENT',
"WHERE-NOT-MOST-FREQUENT", "WHERE-NOT-LEAST-FREQUENT"],
"queries": [
'SELECT * FROM "table_name" WHERE "colors" == "blue"',
'SELECT * FROM "table_name" WHERE "colors" == "green"',
'SELECT * FROM "table_name" WHERE "colors" != "blue"',
'SELECT * FROM "table_name" WHERE "colors" != "green"',
'SELECT * FROM "table_name" WHERE NOT "colors" == "blue"',
'SELECT * FROM "table_name" WHERE NOT "colors" == "green"',
],
"questions": [
'Show the data of the table "table_name" where "colors" is equal to blue',
'Show the data of the table "table_name" where "colors" is equal to green',
'Show the data of the table "table_name" where "colors" is different from blue',
'Show the data of the table "table_name" where "colors" is different from green',
'Show the data of the table "table_name" where "colors" is not equal to blue',
'Show the data of the table "table_name" where "colors" is not equal to green',
]
}
>>> generator._generate_where_numerical("table_name", ["numbers"], sample_df)
>>> generator.sql_generated
{
"sql_tags": ['WHERE-NUM-MAX-VALUES', 'WHERE-NUM-MIN-VALUES',
'WHERE-NUM-MEAN-VALUES', 'WHERE-NUM-MEAN-VALUES'],
"queries": ['SELECT * FROM "table_name" WHERE "numbers" < 5',
'SELECT * FROM "table_name" WHERE "numbers" > 1,
'SELECT * FROM "table_name" WHERE "numbers" > 3.0'
'SELECT * FROM "table_name" WHERE "numbers" < 3.0'],
"question": ['Show the data of the table "table_name" where "numbers" is less than 5',
'Show the data of the table "table_name" where "numbers" is greater than 1',
'Show the data of the table "table_name" where "numbers" is greater than 3.0',
'Show the data of the table "table_name" where "numbers" is less than 3.0'],
}