StelsCSV JDBC Driver v5.0 Documentation

 

 

Contents 

 

Installation

Driver Classes

URL Syntax

Driver Properties

Database Schema

Data Types

Supported SQL Syntax

Connection Example

Driver Modes

Fixed-length files

User-defined SQL functions

 

Installation

 

Add the file csvdriver.jar to your classpath or extract the jar file in the directory of the application.

 

Driver Classes

 

Description

Classes

Driver class (JDBC API v1.0)

jstels.jdbc.csv.CsvDriver2

Data Source class (JDBC API v2.0)

jstels.jdbc.csv.CsvDataSource2

Connection Pool Data Source class (JDBC API v2.0)

jstels.jdbc.csv.CsvConnectionPoolDataSource2

 

URL Syntax

 

The connection URL is jdbc:jstels:csv:csvdir, where csvdir may be the following:

 

Driver Properties

 

The driver supports a number of parameters that change default behavior of the driver.

These properties are the following:

 

charset is used to specify a different than default charset encoding of input file (Default is the JVM default charset

 

commentLine is used to specify a string denoting comment line (By default is not-defined)

 

defaultColumnType is used to specify a default data type for columns in a CSV file (Default is "Varchar"). 

 

decimalFormatInput,  decimalFormatOutput are used to specify input and output formats for floating point values in a CSV file. For instance, you can use these formats to specify currency values, e.g.: "###,###.##$". For more information about patterns used in "decimalFormat" please see the documentation for java.text.DecimalFormat class.

 

dateFormat is used to specify a format for date/time values. A value of this property is a sequence of date/time formats separated by the '|' character, e.g: "dd.MM.yy | dd.MM | dd". (Default is "yyyy-MM-dd HH:mm:ss.SSS | yyyy-MM-dd HH:mm:ss | yyyy-MM-dd |  HH:mm:ss.SSS |  HH:mm:ss"). For more details about date/time format used in the driver please see the chapter "time format syntax" in the java.util.SimpleDateFormat class documentation. 

 

fileExtension is used to specify a different file extension (Default is ".txt")

If extension is set to ".txt", then both "myTable.txt" and myTable are valid.

Note: You should stick to one of these table naming types in the SQL queries, i.e. either "myTable.txt" or myTable.

 

rowDelimiter is used to specify a custom row delimiter for a CSV file. (Default is a line separator specified by JVM environment variable "line.separator", e.g. for Windows - "\r\n", for Linux - "\r")

 

schema is used to specify a path to the schema file. It can be absolute or relative to a CSV directory path, e.g.: "c:/schemas/schema1.xml" or "schemas/schema2.xml". (Default is "schema.xml")

 

separator is used to specify a different column separator (Default is '\t' (tab))

 

suppressHeaders is used to specify if the first line contains column header information (Default is false; column header are on first line).

 

 

Advanced Properties

 

escapeEOLInQuotes is used to protect default line separators within quotes. This parameter should not be used, when the rowDelimiter parameter is set to a custom value. (Default is false).

 

logPath is used to set a file path to the log file.

 

emptyStringAsNull. If 'emptyStringAsNull' is set to 'true', empty strings are treated as NULL values. (By default is true).

 

nullString is a string value that is treated as Null value. Not case sensitive. (By default is "NULL").

 

paddingChar - padding char for fixed-length files. (By default is space).

 

trimBlanks. If 'trimBlanks' is set to 'true', the driver trims leading and trailing spaces for string values when reading a text file. (By default is true).

 

useWebParam is used to specify the name of the web parameter that will be used to transfer a CSV file name to the dynamic server page. For instance, if you specify the following value: 'tablename', the driver will access server page using the following HTTP URL:

http://www.site.com/out.jsp?tablename=sometable (where 'sometable' is the table specified in the SQL query or the schema file)
Note: If you specify driver properties directly in the driver URL and server page has its own parameters as well, you should separate them with '??':

jdbc:jstels:csv:http://www.site.com/out.jsp?param1=value1??useWebParam=tablename&suppressHeaders=true

 

quoteString is used to enable/disable writing of double quotes for string values while inserting and updating records in a CSV file (Default is true).
 

This following example code shows how these properties are used:

 
Properties props = new java.util.Properties();
 
props.put("separator","|");              // separator is a bar
props.put("suppressHeaders","false");    // column headers are on the first line
props.put("fileExtension",".txt");       // file extension is .txt
props.put("charset","ISO-8859-2");       // file encoding is "ISO-8859-2"
props.put("commentLine","--");           // string denoting comment line is "--"
// date/time format
props.put("dateFormat","yyyy-MM-dd HH:mm | dd/MM/yyyy");       
 
Connection conn = DriverManager.getConnection("jdbc:jstels:csv:" + args[0],props);
 
You can also use jstels.jdbc.csv.CsvDataSource2 class:
 
CsvDataSource2 csvDS = new CsvDataSource2();
 
csvDS.setPath("c:/csvfiles");        // path to the CSV directory
csvDS.setSeparator("|");             // separator is a bar
csvDS.setSuppressHeaders(false);     // column headers are on the first line
csvDS.setFileExtension(".txt");      // file extension is .txt
 
Connection conn = csvDS.getConnection();
 
Besides, the driver allows to append the properties to the URL like this:
 
Connection conn = 
DriverManager.getConnection(
"jdbc:jstels:csv:path?suppressHeaders=true&dateFormat=yyyy-MM-dd HH:mm | dd/MM/yyyy&caching=false");
 

 

Database Schema

 

The database schema file is called "schema.xml". It is intended to define data types for columns in your CSV/text files. By default it must be located in the same directory where CSV/text files are contained. To specify another path use the driver property 'schema'.

 

The schema has the following format:

<schema>

  <table name = "my_table1.txt">

    <column name = "my_field1" type = "Integer"/>

    <!-- this tag assigns the SQL data type to the column by its name in the text file-->

    <column pos = "2" type = "Integer"/>

    <!-- this tag assigns the SQL data type to the column by its position in the text file-->

    <column name = "my_field3" pos = "3" type = "Integer"/>

    <!-- if you do not use the column header in the text file, you should set column names by using both the 'pos' and the 'name' attributes-->

  </table>

 

  <!-- you can also use file templates, if your files have the same format-->

  <table name = "*.csv">

    <!-- all files with the template "*.csv" -->

    <!-- the wildcard '*' denotes any string of zero or more characters -->

    <column name = "col1" type = "Varchar"/>

    <column name = "col2" type = "Integer"/>

  </table>

 

  <table name = "file????.*">

    <!-- all files with the template "file????.*" -->

    <!-- the wildcard '?' denotes any single character -->

    <column name = "id" type = "Integer"/>

    <column name = "descr" type = "Varchar" size="50"/>

    <column name = "num" type = "Decimal" size="15" decimalCount="2"/>

    <!-- the 'size' attribute specifies the maximum number of characters for the VARCHAR type or total number of digits that can be stored for the DECIMAL type) -->

    <!-- the 'decimalCount' attribute specifies the maximum number of digits that can be stored   to the right of the decimal separator -->

  </table>

... ... ...   

</schema>

 

You can also set the local properties for each table like this:

<schema>

  <table name = "my_table1.txt" charset = "ISO-8859-2" suppressHeaders = "true" commentLine = "--" separator = "," dateFormat = "dd/MM/yyyy">

    <column name = "my_field1" pos = "1" type = "Integer"/>

    <column name = "my_field2" pos = "2" type = "Integer"/>

    <column name = "my_field3" pos = "3" type = "Date"/>

  </table>

... ... ...   

</schema>

 

Notes:

 

Data Types

 

The driver supports the following data types:

 

Data Type

JDBC returned type

(java.sql.Types.*)

Java class used

Int, Integer, Tinyint, Smallint, SHORT

java.sql.Types.INTEGER

java.lang.Integer

Long, Bigint

java.sql.Types.BIGINT

java.lang.Long

Float, Real

java.sql.Types.FLOAT

java.lang.Float

Double

java.sql.Types.DOUBLE

java.lang.Double

BIGDECIMAL, DECIMAL, NUMERIC (recommended for storing currency values)

java.sql.Types.DECIMAL

java.math.BigDecimal

String, Char, Varchar

java.sql.Types.VARCHAR

java.lang.String

Datetime, Date, Time, Timestamp

java.sql.Types.TIMESTAMP

java.util.Date

BOOLEAN, LOGICAL, BIT

java.sql.Types.BOOLEAN

java.lang.Boolean

 

Notes:

Supported SQL Syntax

 

Since StelsCSV version 5.x uses H2 database as an SQL engine, it supports the most part of ANSI/ISO SQL grammar like SELECT, INSERT, UPDATE, DELETE and CREATE statements.

 

An SQL query must meet the following conditions:

Query examples:

// ---- SELECT queries ---

SELECT SUM(a) AS col1, MAX(b) / MAX(c) AS col2 FROM "test.txt" GROUP BY a HAVING AVG(a) > 30;

SELECT name FROM "salesreps.txt" WHERE (rep_office IN ( 22, 11, 12 ))  OR (manager IS NULL AND hire_date >= PARSEDATETIME('01-05-2002','dd-MM-yyyy') OR (sales > quota AND NOT sales > 600000.0);

SELECT city, target, sales FROM "offices.txt" WHERE region = 'Eastern' AND sales > target ORDER BY city;

 

// ---- SELECT queries with join ----

SELECT * FROM "prices.txt" ps JOIN regions regs ON ps.regionid = regs.id JOIN "products.txt" prod ON prod.prodid = ps.prodid;

SELECT * FROM "prices.txt" ps, "products.txt" prod WHERE prod.prodid = ps.prodid;

 

// ---- INSERT, UPDATE and DELETE commands ----

INSERT INTO "salesreps.txt" (name, age, empl_num, sales, title) VALUES ('Henry Smith', 35, 111, NULL, 'Sales Mgr');

UPDATE "customers.txt" SET credit_limit = 50000.00 WHERE company = 'Acme Mfg.';

DELETE FROM "salesreps.txt" WHERE NAME LIKE 'Henry%';

 

// ---- CREATE TABLE command ----

CREATE TABLE "new_table.txt" (int_col INT, long_col LONG, float_col REAL, double_col DOUBLE, str_col VARCHAR(20), date_col DATETIME, bool_col BOOLEAN, num_col DECIMAL(15,2));
 

// ---- CREATE INDEX command ----

CREATE INDEX i_1 ON "new_table.txt" (int_col);

 

 

See also:

 
Connection Example
 

This example code shows how the driver is used. You can download it here.

import java.sql.*;
 
public class DriverTest
{
  public static void main(String[] args)
  {
    try
    {
      // load the driver into memory
      Class.forName("jstels.jdbc.csv.CsvDriver2");
 
      // create a connection. The first command line parameter is assumed to
      // be the directory in which the .csv files are held
      Connection conn = DriverManager.getConnection("jdbc:jstels:csv:" + args[0] );
 
      // create a Statement object to execute the query with
      Statement stmt = conn.createStatement();
 
      // execute a query
      ResultSet rs = stmt.executeQuery("SELECT * FROM \"test.txt\"");
 
      // read the data and put it to the console

      for (int j = 1; j <= rs.getMetaData().getColumnCount(); j++) {

        System.out.print(rs.getMetaData().getColumnName(j) + "\t");

      }

      System.out.println();

 

      while (rs.next())
      {
            for(int j=1; j <= rs.getMetaData().getColumnCount(); j++){
                   System.out.print(rs.getObject(j)+ "\t");
            }
            System.out.println();
      }
 
      // close the objects
      rs.close();
      stmt.close();
      conn.close();
    }
    catch(Exception e)
    {
      e.printStackTrace();
    }
  }
}

 

 

Driver Modes

 

There are two main modes the driver can run in: the mode with data caching and the mode with data swapping. Running in the first mode (default mode), the driver caches CSV files in the RAM making it possible to achieve maximum performance while processing CSV. Keep in mind that Java Virtual Machine must have enough free memory allotted for storing large tables (use -Xms and -Xmx JVM options). You can also use the DROP TABLE <table name> FROM CACHE command to force tables to be removed from the cache.

 

The second mode is recommended for processing large CSV files (>100 MB). To use this mode, set the driver property caching to false. There are also some properties to configure this mode:

 

tempPath - directory where temporary files will be created (by default it is a OS temporary directory, specified by JVM environment variable "java.io.tmpdir"). It is recommended to use a non-default value.

 

Example:

Properties props = new Properties();

props.setProperty("caching", "false");         // switch to the swapping mode
props.setProperty("tempPath", "c:/temp");     
Connection conn = DriverManager.getConnection("jdbc:jstels:csv:" + args[0], props);

 

 

Fixed-length files

 

The driver also supports the fixed-length files. For this, you should set the 'fixed' value of the 'separator' parameter. Besides you must specify the 'begin' and 'end' attributes for each file column in the schema file. The 'begin' attribute sets the initial position of the column in the text file and the 'end' attribute sets its final position.

For example:
 

<schema>

  <table name="my_table1.txt">

    <column name="id" begin="1" end="5" type="Integer"/>

    <column name="name" begin="6" end="20" type="Varchar"/>

    <column name="birthdate" begin="21" end="28" type="Date"/>

    ...

  </table>

  <table name="my_table2.txt">

... ... ...   

</schema>

 

 

User-defined SQL functions

 

You can use your own SQL functions in the driver. To use this feature, you should do the following: 

 

1) Create a static method that will act as an SQL function. Mind that:

For example:

package my_pack;

public class MyFuncs{

// user-defined SQL function that formats the given argument into a date/time string with specified format

public static String format_date( java.util.Date d, String format ) {
    // process the null values

    if (d == null || format == null)
    return null;
    java.text.SimpleDateFormat sdf = new java.text.SimpleDateFormat(format);
   
// return a function result with java.lang.String type

    return sdf.format(d);
}
}

 

2) Register the function by executing CREATE ALIAS ... FOR command.

For example:

CREATE ALIAS format_date FOR "my_pack.MyFuncs.format_date"

 

Also, you can use the driver property function:<my_func>.

For example:

Properties props = new java.util.Properties();
props.put("function:formate_date","my_pack.MyFuncs.format_date");
...  
Connection conn = DriverManager.getConnection("jdbc:jstels:csv:" + args[0],props);
// or append this property to the URL
Connection conn2 = DriverManager.getConnection("jdbc:jstels:csv:" + args[0]
 + "?function:formate_date=my_pack.MyFuncs.format_date");

 

3) Call the function in an SQL query

For example:

Statement st = connection.createStatement();

st.execute( "select format_date( date_column , 'yyyy-MM-dd' ) from \"test.txt\"" );

 

 

[HOME]   [TOP]