Database Reader & Creation
Overview
The database_reader package handles connections with the databases file.
It contains two class:
- SingleDatabase: Given a dictionary of pandas dataframe, it creates the database folder with the database sqlite file.
- MultipleDatabases: Given the path where the databases are stored, it automatically creates the connection with the stored file.
If the data is already stored in database sqlite files following this pattern "db_save_path/db_id/db_id.sqlite" you can create a connection between the tool and the files with MultipleDatabase class:
from qatch.database_reader import MultipleDatabases
# The path to multiple databases
db_save_path = 'test_db'
databases = MultipleDatabases(db_save_path)
Instead, if you want to specify different data you can use the SingleDatabase class to create the sqlite databases in "db_save_path/db_id/db_id.sqlite"
Assume the PKs have all different names. No two tables with the same PK name.
import pandas as pd
from qatch.database_reader import SingleDatabase
# Create dummy table
data = {
"year": [1896, 1900, 1904, 2004, 2008, 2012],
"city": ["athens", "paris", "st. louis", "athens", "beijing", "london"]
}
table = pd.DataFrame.from_dict(data)
# define the tables in the database (<table_name> : <table>)
db_tables = {'olympic_games': table}
# define where to store the sqlite database
db_save_path = 'test_db'
# define the name of the database
db_id = 'olympic'
# define the PK
# Assume the PKs have all different names. Two tables cannot have same PK name.
table2primary_key = {'olympic_games': 'id'}
# create database connection
db = SingleDatabase(db_path=db_save_path, db_name=db_id, tables=db_tables, table2primary_key=table2primary_key)
MultipleDatabases
Manages multiple SQLite databases, allowing dynamic creation and access to individual databases.
Attributes:
Name | Type | Description |
---|---|---|
db_path |
str
|
The base path where the database files are stored. |
db_ids2db |
dict
|
A dictionary where the key is the database name, and the value is the SingleDatabase object. |
_max_db_in_memory |
int
|
The maximum number of databases to keep in memory. Default is 15. |
Source code in qatch/database_reader/multiple_databases.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 |
|
close()
get_all_table_schema_given(db_id)
Retrieves all the table schema from the database.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
db_id |
str
|
The name of the database. |
required |
Returns:
Type | Description |
---|---|
dict[str, DataFrame]
|
pd.DataFrame: A Pandas DataFrame representing the schema of the specified table. |
Source code in qatch/database_reader/multiple_databases.py
get_db_names()
Gets the name of the database file from the path.
Returns:
Type | Description |
---|---|
list[str]
|
list[str]: A list of database file names. |
Source code in qatch/database_reader/multiple_databases.py
get_schema(db_id, tbl_name)
Retrieves the schema of a specified table from the database.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
db_id |
str
|
The name of the database. |
required |
tbl_name |
str
|
The name of the table to retrieve the schema from. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
pd.DataFrame: A Pandas DataFrame representing the schema of the specified table. |
Source code in qatch/database_reader/multiple_databases.py
get_table(db_id, tbl_name)
Retrieves a specified table from the database as a Pandas DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
db_id |
str
|
The name of the database. |
required |
tbl_name |
str
|
The name of the table to retrieve from the database. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
pd.DataFrame: A Pandas DataFrame representing the specified table from the database. |
Source code in qatch/database_reader/multiple_databases.py
open_db(db_id)
Opens a database with the given name and stores it in memory.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
db_id |
str
|
The name of the database to open. |
required |
Source code in qatch/database_reader/multiple_databases.py
run_multiple_queries(db_id, queries)
Executes multiple queries on the specified database and returns the results.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
db_id |
str
|
The name of the database to execute the query on. |
required |
queries |
list
|
The list of SQL queries to be executed on the database. |
required |
Returns:
Type | Description |
---|---|
list[list]
|
list[list]: A list containing the query results. |
Source code in qatch/database_reader/multiple_databases.py
run_query(db_id, query)
Executes an SQL query on the specified database and returns the results.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
db_id |
str
|
The name of the database to execute the query on. |
required |
query |
str
|
The SQL query to be executed on the database. |
required |
Returns:
Type | Description |
---|---|
list | None
|
list | None: A list containing the query results. |
Source code in qatch/database_reader/multiple_databases.py
SingleDatabase
Provides a simplified interface for interacting with SQLite databases in Python.
Note: If the database already exist, do not replace the tables
Attributes:
Name | Type | Description |
---|---|---|
conn |
Connection
|
A connection object representing the SQLite database. |
cursor |
Cursor
|
A cursor object used to execute SQL commands and retrieve results. |
db_name |
str
|
The name of the database. |
table_schemas |
Dict[str, DataFrame]
|
Table name as key, table schema as value. Table schema is a Pandas DataFrame with columns: cid, name, type, notnull, dflt_value, pk. |
table_names |
str
|
The names of the tables in the database. |
db_path |
str
|
Path to the folder which contains the SQLite file. |
db_path_sqlite |
str
|
Path to the SQLite file. |
Source code in qatch/database_reader/single_database.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 |
|
get_schema_given(table_name)
Given the table name, returns the schema of the table. Table schema is a Pandas DataFrame with columns: cid, name, type, notnull, dflt_value, pk.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table_name |
str
|
The name of the table to retrieve the schema from the database. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
pd.DataFrame: A Pandas DataFrame representing the schema of the specified table from the database. |
Source code in qatch/database_reader/single_database.py
get_table_given(table_name)
Retrieves a specified table from the database as a Pandas DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table_name |
str
|
The name of the table to retrieve from the database. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
pd.DataFrame: A Pandas DataFrame representing the specified table from the database. |
Source code in qatch/database_reader/single_database.py
run_query(query)
Run a query on the database and return the result.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
query |
str
|
The SQL query to be executed on the database. |
required |
Returns: list[list]: A list of lists representing the result of the SQL query.
Source code in qatch/database_reader/single_database.py
set_tables_in_db(tables, table2primary_key)
Sets the tables in the SQLite database represented by the given connection object.
This method takes a dictionary of tables in which keys are table names and values are Pandas DataFrames
representing the tables, and sets these tables in the SQLite database represented by the conn
object.
The optional table2primary_key
argument can be used to set primary keys for some or all tables.
If not provided, all tables are created without primary keys.
If the table contains an attribute with the same name of a primary key, a foreign key relationship is created.
Note
- If a table is named as 'table', the method will replace its name with 'my_table'.
- Assume the PKs have all different names. two tables must have different PK names.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tables |
Optional[Dict[str, DataFrame]]
|
A dictionary of tables to set in the SQLite database. Keys are table names and values are corresponding Pandas DataFrames. |
required |
table2primary_key |
Optional[Dict[str, str]]
|
A dictionary mapping table names to primary keys.
For example, if you want to set the primary key of table |
required |