Supported Models
ChatGPT_QA
Bases: AbstractChatGPT
A Subclass of AbstractChatGPT
which provides functionality to act as a question answering model
for tabular data.
Attributes:
Name | Type | Description |
---|---|---|
api_key |
str
|
The API key for the OpenAI client. |
api_org |
str
|
The organization ID for the OpenAI account. Defaults to None. |
model_name |
str
|
The name of the model to use. Defaults to 'gpt-3.5-turbo-0613'. |
Methods:
Name | Description |
---|---|
name |
Property attribute which returns the model name. |
prompt |
Property attribute which provides instructions for the model in a defined format. |
process_input |
Converts input data into a format which model can interpret. |
_normalize_output |
Normalize the output for question answering. |
Note
- The model used in this class is "gpt-3.5-turbo-0613" but you can specify any version you want.
- The prompt contains few-shot examples to improve the QA task results
Examples:
>>> import pandas as pd
>>> from qatch.models import ChatGPT_QA
>>>
>>> data = pd.DataFrame([
... ["John Doe", "123-456-7890"],
... ["Jane Doe", "098-765-4321"]
... ], columns=["Name", "Phone Number"])
>>>
>>> chatgpt_qa_instance = ChatGPT_QA(api_key=credentials['api_key_chatgpt'],
>>> api_org=credentials['api_org_chatgpt'],
>>> model_name="gpt-3.5-turbo-0613")
>>> query = "What is John Doe's phone number?"
>>> answer = chatgpt_qa_instance.predict(table=data, query=query, tbl_name='Contact Info')
>>> print(answer)
[['123-456-7890']]
Source code in qatch/models/chatgpt/chatgpt_QA.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 |
|
ChatGPT_SP
Bases: AbstractChatGPT
A Subclass of AbstractChatGPT
which provides functionality to act as a semantic parsing model for tabular data.
Attributes:
Name | Type | Description |
---|---|---|
api_key |
str
|
The API key for the OpenAI client. |
api_org |
str
|
The organization ID for the OpenAI account. Defaults to None. |
model_name |
str
|
The name of the model to use. Defaults to 'gpt-3.5-turbo-0613'. |
Methods:
Name | Description |
---|---|
name |
Property attribute which returns the model name. |
prompt |
Property attribute which provides instructions for the model in a defined format: Table name: "body-builder", Schema: "[Name, Surname]", Questions: "Show all information about each body builder" |
process_input |
Converts input data into a format which model can interpret. |
_normalize_output |
Normalize the output for question answering. |
Note
- The model used in this class is "gpt-3.5-turbo-0613" but you can specify any version you want.
- The prompt contains few-shot examples to improve the QA task results
Examples:
>>> import pandas as pd
>>> from qatch.models import ChatGPT_SP
>>>
>>> data = pd.DataFrame([
... ["John Doe", "123-456-7890"],
... ["Jane Doe", "098-765-4321"]
... ], columns=["Name", "Phone Number"])
>>>
>>> chatgpt_sp_instance = ChatGPT_SP(api_key=credentials['api_key_chatgpt'],
>>> api_org=credentials['api_org_chatgpt'],
>>> model_name="gpt-3.5-turbo-0613")
>>> query = "What is John Doe's phone number?"
>>> answer = chatgpt_sp_instance.predict(table=data, query=query, tbl_name='Contact Info')
>>> print(answer)
SELECT "Phone Number" FROM "Contact Info" WHERE "Name" = "John Doe"
Source code in qatch/models/chatgpt/chatgpt_SP.py
ChatGPT_SP_join
Bases: AbstractChatGPT
Implementation of the Llama2 model specialized for semantic parsing (SP) with JOIN statements. Inherits from the Abstract Llama2 model class.
This model processes the provided schemas and queries, and after transformation, predicts the appropriate SQL statements.
Attributes:
Name | Type | Description |
---|---|---|
api_key |
str
|
The API key for the OpenAI client. |
api_org |
str
|
The organization ID for the OpenAI account. Defaults to None. |
model_name |
str
|
The name of the model to use. Defaults to 'gpt-3.5-turbo-0613'. |
Methods:
Name | Description |
---|---|
name |
Property attribute which returns the model name. |
prompt |
Property attribute which provides instructions for the model in a defined format: Database table names: ["customer", "product"], Schema table "customer": [CustomerID, name, surname] Schema table "product": [ProductID, CustomerID, name, surname, price] Question: "which products did Simone buy?" |
process_input |
Processes given inputs into a form that model can consume. Extracts and structures relevant data for the SP task. |
_normalize_output |
Normalizes the text received from model predictions. Strips away unnecessary characters from the result SQL statement. |
For this model, the table
parameter in predict and process_input methods
is not used and can be set to None.
Examples:
>>> chatgpt_sp_join = ChatGPT_SP_join(api_key=credentials['api_key_chatgpt'],
>>> api_org=credentials['api_org_chatgpt'],
>>> model_name="gpt-3.5-turbo-0613")
>>> # you need to specify all the database table schema
>>> # if you are using QATCH, you can use database.get_all_table_schema_given(db_id='name_of_the_database')
>>> db_table_schema = {
... "student": {"name": ["StudentID", "Grade", "PhoneNumbers"]},
... "customer": {"name": ["CustomerID", "name", "surname"]},
... "product": {"name": ["ProductID", "CustomerID", "name", "surname", "price"]}
... }
>>> query = "which products did Simone buy?"
>>> chatgpt_sp_join.predict(table=None,
>>> query=query,
>>> tbl_name=["customer", "product"],
>>> db_table_schema=db_table_schema)
SELECT T1.name, T2.name FROM "customer" as T1 JOIN "product" as T2 WHERE T1.name == "Simone"
Source code in qatch/models/chatgpt/chatgpt_SP_join.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 |
|
LLama2_QA
Bases: AbstractLLama2
A Subclass of AbstractLLama2
which provides functionality to act as a question answering model
for tabular data.
Attributes:
Name | Type | Description |
---|---|---|
model_name |
str
|
Name of the Llama model. |
hugging_face_token |
(str, None)
|
Token for the Hugging Face. |
force_cpu |
bool
|
To force usage of cpu. Defaults to False. |
Methods:
Name | Description |
---|---|
name |
Property attribute which returns the model name. |
prompt |
Property attribute which provides instructions for the model in a defined format. |
process_input |
Converts input data into a format which model can interpret. |
_normalize_output |
Normalize the output for question answering. |
Note
- The model used in this class is "meta-llama/Llama-2-7b-chat-hf".
- The prompt contains few-shot examples to improve the QA task results
Examples:
>>> import pandas as pd
>>> from qatch.models import LLama2_QA
>>>
>>> data = pd.DataFrame([
... ["John Doe", "123-456-7890"],
... ["Jane Doe", "098-765-4321"]
... ], columns=["Name", "Phone Number"])
>>>
>>> llama2_qa_instance = LLama2_QA("meta-llama/Llama-2-7b-chat-hf")
>>> query = "What is John Doe's phone number?"
>>> answer = llama2_qa_instance.predict(table=data, query=query, tbl_name='Contact Info')
>>> print(answer)
[['123-456-7890']]
Source code in qatch/models/llama2/llama2_QA.py
LLama2_SP
Bases: AbstractLLama2
A Subclass of AbstractLLama2
which provides functionality to act as a semantic parsing model for tabular data.
Attributes:
Name | Type | Description |
---|---|---|
model_name |
str
|
Name of the Llama model. |
hugging_face_token |
(str, None)
|
Token for the Hugging Face. |
force_cpu |
bool
|
To force usage of cpu. Defaults to False. |
Methods:
Name | Description |
---|---|
name |
Property attribute which returns the model name. |
prompt |
Property attribute which provides instructions for the model in a defined format: Table name: "body-builder", Schema: "[Name, Surname]", Questions: "Show all information about each body builder" |
process_input |
Converts input data into a format which model can interpret. |
_normalize_output |
Normalize the output for question answering. |
Note
- The model used in this class is "codellama/CodeLlama-7b-Instruct-hf".
- The prompt contains few-shot examples to improve the SP task results
Examples:
>>> import pandas as pd
>>> from qatch.models import LLama2_QA
>>>
>>> data = pd.DataFrame([
... ["John Doe", "123-456-7890"],
... ["Jane Doe", "098-765-4321"]
... ], columns=["Name", "Phone Number"])
>>>
>>> llama2_sp_instance = LLama2_SP("codellama/CodeLlama-7b-Instruct-hf")
>>> query = "What is John Doe's phone number?"
>>> answer = llama2_sp_instance.predict(table=data, query=query, tbl_name='Contact Info')
>>> print(answer)
SELECT "Phone Number" FROM "Contact Info" WHERE "Name" = "John Doe"
Source code in qatch/models/llama2/llama2_SP.py
LLama2_SP_join
Bases: AbstractLLama2
Implementation of the Llama2 model specialized for semantic parsing (SP) with JOIN statements. Inherits from the Abstract Llama2 model class.
This model processes the provided schemas and queries, and after transformation, predicts the appropriate SQL statements.
Attributes:
Name | Type | Description |
---|---|---|
model_name |
str
|
Name of the Llama model. |
hugging_face_token |
(str, None)
|
Token for the Hugging Face. |
force_cpu |
bool
|
To force usage of cpu. Defaults to False. |
Methods:
Name | Description |
---|---|
name |
Property attribute which returns the model name. |
prompt |
Property attribute which provides instructions for the model in a defined format: Database table names: ["customer", "product"], Schema table "customer": [CustomerID, name, surname] Schema table "product": [ProductID, CustomerID, name, surname, price] Question: "which products did Simone buy?" |
process_input |
Processes given inputs into a form that model can consume. Extracts and structures relevant data for the SP task. |
_normalize_output |
Normalizes the text received from model predictions. Strips away unnecessary characters from the result SQL statement. |
For this model, the table
parameter in predict and process_input methods
is not used and can be set to None.
Examples:
>>> llama_sp_join = LLama2_SP_join(model_name="codellama/CodeLlama-7b-Instruct-hf",
>>> hugging_face_token=credentials['hugging_face_token'])
>>> # you need to specify all the database table schema
>>> # if you are using QATCH, you can use database.get_all_table_schema_given(db_id='name_of_the_database')
>>> db_table_schema = {
... "student": {"name": ["StudentID", "Grade", "PhoneNumbers"]},
... "customer": {"name": ["CustomerID", "name", "surname"]},
... "product": {"name": ["ProductID", "CustomerID", "name", "surname", "price"]}
... }
>>> query = "which products did Simone buy?"
>>> llama_sp_join.predict(table=None,
>>> query=query,
>>> tbl_name=["customer", "product"],
>>> db_table_schema=db_table_schema)
SELECT T1.name, T2.name FROM "customer" as T1 JOIN "product" as T2 WHERE T1.name == "Simone"
Source code in qatch/models/llama2/llama2_SP_join.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 |
|
Omnitab
Bases: AbstractModel
The Omnitab class inherits from the AbstractModel and specializes it to parse tables using the Omnitab model.
Attributes:
Name | Type | Description |
---|---|---|
tokenizer |
AutoTokenizer
|
The tokenizer for input preprocessing. |
model |
AutoModelForSeq2SeqLM
|
The model used to answer the queries from the table. |
Note
- The model used in this class is 'neulab/omnitab-large-finetuned-wtq'.
- The Omnitab model works specifically with tables that only contain strings and has a model input limit of 1024 tokens.
Examples:
>>>import pandas as pd
>>>from qatch.models import Tapas
>>>
>>> data = pd.DataFrame([
... ["John Doe", "123-456-7890"],
... ["Jane Doe", "098-765-4321"]
... ], columns=["Name", "Phone Number"])
>>>
>>> omnitab_model = Omnitab("'neulab/omnitab-large-finetuned-wtq'")
>>> query = "What is John Doe's phone number?"
>>> answer = omnitab_model.predict(table=data, query=query, tbl_name='Contact Info')
>>> print(answer)
[['123-456-7890']]
Source code in qatch/models/omnitab.py
Tapas
Bases: AbstractModel
The Tapas class inherits from the AbstractModel and specializes it to parse tables using the TAPAS model.
Attributes:
Name | Type | Description |
---|---|---|
tokenizer |
TapasTokenizer
|
The tokenizer for input preprocessing. |
model |
TapasForQuestionAnswering
|
The model used to answer the queries from the table. |
Note
- The model used in this class is
google/tapas-large-finetuned-wtq
. - The TAPAS model works specifically with tables that only contain strings and has a model input limit of 512 tokens.
Examples:
>>>import pandas as pd
>>>from qatch.models import Tapas
>>>
>>> data = pd.DataFrame([
... ["John Doe", "123-456-7890"],
... ["Jane Doe", "098-765-4321"]
... ], columns=["Name", "Phone Number"])
>>>
>>> tapas_model = Tapas("google/tapas-large-finetuned-wtq")
>>> query = "What is John Doe's phone number?"
>>> answer = tapas_model.predict(table=data, query=query, tbl_name='Contact Info')
>>> print(answer)
[['123-456-7890']]
Source code in qatch/models/tapas.py
Tapex
Bases: AbstractModel
The Tapex class inherits from the AbstractModel and specializes it to parse tables using the TAPEX model.
Attributes:
Name | Type | Description |
---|---|---|
tokenizer |
TapexTokenizer
|
The tokenizer for input preprocessing. |
model |
BartForConditionalGeneration
|
The model used to answer the queries from the table. |
Note
- The model used in this class is 'microsoft/tapex-large-finetuned-wtq'.
- The TAPEX model works specifically with tables that only contain strings and has a model input limit of 1024 tokens.
Examples:
>>> import pandas as pd
>>> from qatch.models import Tapex
>>>
>>> data = pd.DataFrame([
... ["John Doe", "123-456-7890"],
... ["Jane Doe", "098-765-4321"]
... ], columns=["Name", "Phone Number"])
>>>
>>> tapex_model = Tapex("microsoft/tapex-large-finetuned-wtq")
>>> query = "What is John Doe's phone number?"
>>> answer = tapex_model.predict(table=data, query=query, tbl_name='Contact Info')
>>> print(answer)
[['123-456-7890']]
Source code in qatch/models/tapex.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 |
|