Data storage

Fluent is architectured to securely query your data warehouse whilst protecting your data security.

When you ask Fluent a question, it first reads the schema of your database, then sends portions (along with other metadata) to a variety of OpenAI models to generate a SQL query. This query is executed against your database and the results are streamed to a storage location. The results are stored as a CSV.

This storage location can either be an S3 bucket (or equivalent i.e. GCP bucket) that we provide, or you can bring your own storage bucket and then query results will be streamed straight to your storage. This means that we will never store any of your data.

This diagram explains how Fluent stores and accesses data if you connect your own bucket:

Fluent Architecture

What happens then when you ask a question

  1. Your question gets sent from your Fluent client on your machine to Fluent backend.
  2. Fluent then requests and receives relevant metadata from your data source i.e. table names, column names, the type of data in columns.
  3. Fluent sends this metadata along with examples of previous queries to a variety of fine-tuned OpenAI models. At this point, some actual data may get sent to OpenAI in two places:This data is never stored.
    1. Fluent can check the possible values in a column and if there are less than 100 these will be sent to OpenAI. The reason for this is so that Fluent can make simple corrections. For instance, if you have a table of cars which all have a colour RED, BLUE, or GREEN and you ask “how many green cars are there” then Fluent needs to be able to correct green to GREEN.
    2. When Fluent generates a potential SQL query it sends the first row of the results to OpenAI. This is so that it can determine the appropriate data types.
  4. Fluent receives the generated SQL, relevant questions, etc.
  5. Fluent sends this SQL to your data source which then executes it.
  6. The results of this query are streamed to a CSV-like file which is created in a storage bucket. In the diagram above, the storage bucket is sitting in your infrastructure, but if you don’t want to provide a bucket then we’ll use our own which will sit in our infrastructure.
  7. Your Fluent client requests the results of the query, which are then supplied to your machine.