ITG RT Quote Report Logic Document v1.2

Title page

Technical Document

ITG/FXall TCA RT Quote data Report

Created by

Sandeep Borkar

Version 1.2

Created: 7th August 2008

Modified: 28th August 2008

Confidential

Change history

Overview
This document explains the technical design to generate RT Quote data reports for ITG. Great care has been taken to optimize this Application to process a large number of records in a short period of time. It has been devised to fetch data from the DB and then process this data locally. This minimizes interaction with the DB resulting in a quick extraction of data. The application is also architected so that only one pass over the data file is required resulting in quick processing and a minimal memory foot print.

Problem statement
The RT Quote Data format is as follows


 * Report should contain data for trading day running from 5 pm New York Time on the previous calendar day until 5 pm New York Time on the trading day itself.
 * Data to be provided for SPOTS forwards and even swaps only.
 * Unlike other fields QTIMESTAMP_END cannot be obtained directly from the database table and will have to be obtained using program logic.
 * Bid and ask pairs from multiple combinations for the same dealid need to be paired.
 * No of providers participating in a deal id.

Disclaimer

'''The logic and design to extract QF data is to be decided and will be included in next version of this document. This document only deals with source as ’ RFQ ’ and ’ RFQ2 ’ messages from the trade_messages table.'''

Solution
The flow of events to generate the report is as follows

1. A cron job will execute a Shell script which will execute a Java application.

2. The name of the report start and end dates will be passed to the Java Application.

3. The Java program executes the appropriate query from a properties file using the name (provided as one of the inputs). It also picks up database connection credentials such as the URL, username and password from another properties file.

4. The data of the query is copied to a temporary CSV file.

5. The script then calls another Java application to perform the logic described in point c and d of the problem statement.

6. On completion, the data is written into a CSV file.

Steps to extract Data with even swaps only
This section describes how the "Problem Statement" shall be implemented.

1) Logic to extract even swaps Discrepancy_field contains the transaction type. The query will first filter using the condition Discrepancy_field = ’ SWAP ’.

2) Find Even SWAPS

1) A ’ SWAP ’ contains two legs or value dates, the program will compare the dealt_amt of the two legs and if they are the same through out the life of the deal, the data will be picked as an even swap.

3) The query to get the even swaps is as follows, it still requires optimizations.

select

trd3. trade_gtmsid trade_gtmsid,

trd3.evt_time evt_time,

trd3.value_date value_date,

trd3.provider_name provider_name,

trd3.side side,

trd3.dealt_amt dealt_amt,

trd3.message_id message_id,

trd3.DISCREPANCY_FIELD DISCREPANCY_FIELD,

trd3.ccy_pair ccy_pair,

’ QTIMESTAMP_END ’ QEND,

trd3.spot_rate spot_rate,

trd3.forward_pts forward_pts,

trd3.all_in_rate all_in_rate,

trd3.source source,

null country,

null segment1,

trd3.event_date event_date

from rpt.TRADE_MESSAGES trd3,

(

select

distinct trd1. trade_gtmsid

             from

rpt. TRADE_MESSAGES trd1, rpt. TRADE_MESSAGES trd2

where

trd1. message_yyyymm =  ’ 200711 ’

and trd1.message_yyyymmdd between ’ 20071126 ’ and ’ 20071127 ’

and trd1.event_date between to_date( ’ 11/26/2007 05:00:00 pm ’, ’ mm/dd/yyyy hh:mi:ss am ’ ) and to_date( ’ 11/27/2007 05:00:00 pm ’ , ’ mm/dd/yyyy hh:mi:ss am ’ )

and upper(trd1.DISCREPANCY_FIELD) = ’ SWAP ’

and trd1.message_id = ’ 1102 ’

and (upper(trd1.side) = ’ ASK ’ or upper(trd1.side)= ’ BID ’ )

and trd1.trade_gtmsid = trd2.trade_gtmsid

and trd1.value_date != trd2.value_date

and trd1.dealt_amt = trd2.dealt_amt

)

Query to fetch Even Swaps, FORWARDS and SPOT data
1) This query fetches the data for trading day running from 5 pm New York Time on the previous calendar day until 5 pm New York Time on the trading day itself.

2) It also fetches even swaps, forwards and SPOT data.

3) The query has been tested in RPTPT and on a production dump this resulted in an 800,000 record test case.

Datasets obtained by running this query will be used as the input to compute the following:

a. Unlike other fields QTIMESTAMP_END cannot be obtained directly from the database table and will have to be obtained using program logic.

b. Bid and ask pairs from multiple combinations for the same dealid need to be paired.

c. No of providers for a dealid need to be listed.

5)  The Query is as follows


 * select MQ2.* from

(

select

trd3. trade_gtmsid trade_gtmsid<font color="#0000F0">,

trd3.evt_time evt_time,

trd3.value_date value_date,

trd3.provider_name provider_name,

trd3.side side,

trd3.dealt_amt dealt_amt,

trd3.message_id message_id,

trd3.DISCREPANCY_FIELD DISCREPANCY_FIELD,

trd3.ccy_pair ccy_pair,

’ QTIMESTAMP_END ’ QEND,

trd3.spot_rate spot_rate,

trd3.forward_pts forward_pts,

trd3.all_in_rate all_in_rate,

trd3.source source,

null country,

null segment1,

trd3.event_date event_date

from rpt.TRADE_MESSAGES trd3

where

message_yyyymm <font color="#0000F0">= <font color="#FF0000"> ’ 200711 ’

and message_yyyymmdd between ’ 20071126 ’ and ’ 20071127 ’

and event_date between to_date( ’ 11/26/2007 05:00:00 pm ’, ’ mm/dd/yyyy hh:mi:ss am ’ ) and to_date( ’ 11/27/2007 05:00:00 pm ’ , ’ mm/dd/yyyy hh:mi:ss am ’ )

and message_id between ’ 1102 ’ and ’ 1106 ’

and (upper(side) = ’ ASK ’ or upper(side)= ’ BID ’ )

and (upper(DISCREPANCY_FIELD) = ’ FORWARD ’ or upper(DISCREPANCY_FIELD) = ’ SPOT ’ )

--order by trade_gtmsid,provider_name,evt_time

<font color="#0000F0">union

select

trd3<font color="#0000F0">. trade_gtmsid trade_gtmsid<font color="#0000F0">,

trd3.evt_time evt_time,

trd3.value_date value_date,

trd3.provider_name provider_name,

trd3.side side,

trd3.dealt_amt dealt_amt,

trd3.message_id message_id,

trd3.DISCREPANCY_FIELD DISCREPANCY_FIELD,

trd3.ccy_pair ccy_pair,

’ QTIMESTAMP_END ’ QEND,

trd3.spot_rate spot_rate,

trd3.forward_pts forward_pts,

trd3.all_in_rate all_in_rate,

trd3.source source,

null country,

null segment1,

trd3.event_date event_date

from rpt.TRADE_MESSAGES trd3,

(select distinct trd1.trade_gtmsid--value_date,side,dealt_amt,DISCREPANCY_FIELD, ccy_pair, evt_time, ’ QTIMESTAMP_END ’ ,spot_rate,forward_pts,all_in_rate,source,null,null, provider_name,event_date /*id,trade_gtmsid,message_id,event_date,evt_time,side,provider_name */

from rpt.TRADE_MESSAGES trd1,rpt.TRADE_MESSAGES trd2

where

trd1<font color="#0000F0">. message_yyyymm <font color="#0000F0">= <font color="#FF0000"> ’ 200711 ’

and trd1.message_yyyymmdd between ’ 20071126 ’ and ’ 20071127 ’

and trd1.event_date between to_date( ’ 11/26/2007 05:00:00 pm ’, ’ mm/dd/yyyy hh:mi:ss am ’ ) and to_date( ’ 11/27/2007 05:00:00 pm ’ , ’ mm/dd/yyyy hh:mi:ss am ’ )

and upper(trd1.DISCREPANCY_FIELD) = ’ SWAP ’

and trd1.message_id = ’ 1102 ’

and (upper(trd1.side) = ’ ASK ’ or upper(trd1.side)= ’ BID ’ )

and trd1.trade_gtmsid = trd2.trade_gtmsid

and trd1.value_date != trd2.value_date

and trd1.dealt_amt = trd2.dealt_amt)MQ1

where MQ1.trade_gtmsid = trd3.trade_gtmsid

and trd3.message_id between ’ 1102 ’ and ’ 1106 ’ ) MQ2

order by MQ2.trade_gtmsid, MQ2.evt_time

Steps to determine the QTIMESTAMP-END value for each row
1) The Java Application will read the input file and capture each row in a data structure using the dealid + provider_name as the key for the row.

2)  For example

5989710,LOADBANK17,7/7/2008 5:00:24 PM,1102,Forward,GBP.USD,2008/07/07 21:00:24:610,QTIMESTAMP_END,20090709,Bid,1000000.00,2.0006,2.70,2.000570,RFQ,null,null

5989710,LOADBANK17,7/7/2008 5:00:24 PM,1102,Forward,GBP.USD,2008/07/07 21:00:24:611,QTIMESTAMP_END,20090709,Ask,1000000.00,2.0003,2.70,2.000570,RFQ,null,null

5989710,LOADBANK17,7/7/2008 5:00:25 PM,1102,Forward,GBP.USD,2008/07/07 21:00:25:109,QTIMESTAMP_END,20090709,Bid,1000000.00,2.0005,2.70,2.000570,RFQ,null,null

5989710,LOADBANK17,7/7/2008 5:00:25 PM,1102,Forward,GBP.USD,2008/07/07 21:00:25:110,QTIMESTAMP_END,20090709,Ask,1000000.00,2.0003,2.70,2.000570,RFQ,null,null

5989710,LOADBANK17,7/7/2008 5:00:25 PM,1106,Forward,GBP.USD,2008/07/07 21:00:25:256,QTIMESTAMP_END,20090709,Bid,1000000.00,2.0005,2.70,2.000570,RFQ,null,null

The application will store 5989710 and LOADBANK17 as the identifier to the entire row.

. The figure below explains how the hashmaps will store the data in the memory







3) The first row in the file is stored with Dealid and provider_name combination as the key to the entire row from the file. Deal id points to another hashmap that contains provider name as the key and an ArrayList that contains the entire row stored as an array object. The provider is used as a key for convenience in quickly searching data. (significance of this structure explained a little later in the document).

4) When the next row in the file is read, the program will use the dealid, provider_name from this row to check if such a combination exists in the hashmaps. Depending on whether it finds a match or not, the following steps would be carried out.

4.1 If the program finds a match, it will pull out the array against the dealid + provider combination (there can be multiple rows),

4.1.1 If the current row has message_id as "1102"

The program, will match the "side", "message_id" <font color="#3366FF">and value date of the current row with all existing rows in the array. Wherever, it finds a match and if the QTIMESTAMP_END value is blank, the quote start time of current row will be updated as the end time of the row matched from within the memory.

Also the program will store this in the ArrayList against that deal id and provider in the memory.

i.e. the dataset shown previously, assume the current row read by the program is, the third from the top, this means that, two rows that appear before this row, are already read and stored in the array.

The program upon reading this row will check all the rows stored in the array to find a match based on the criteria described above and then update the  "QTIMESTAMP_END", which in this case, it finds a match in the first row and so the QTIMESTAMP_END column of the first row is updated with the Start time of the current row indicating "end of quote".

Note: The program should find only one such row in the array with the logic explained above.

4.1.2 If the current row has message_id "1106",

"1106" indicates "customer has accepted the quote".

The program will compare the following data points "side", "All-in-rate", "spot-rate" and "Forward_Pts" and value date of all previous rows stored in the array, and when it finds an exact match to all these columns, the program will update the "QTIMESTAMP_END" column with the data from "QTIMESTAMP_START" value of this row.

In case multiple rows that are matching the condition are found, the program will update the "QTIMESTAMP_END" of the latest quote (based on event time value) with the "QTIMESTAMP_START" value of this row.

"1106" is "customer accept" and hence no more quote messages will be expected on this dealid and so all open quotes will not be used any further. Hence, all the rows for this deal id that do not have a valid end time(in the QTIMESTAMP_END column), will have their end time column updated with the "QTIMESTAMP_START" value of this row, irrespective of the provider,

Special Note:

In case, a message row(irrespective of the meesage id) for this deal id, is found in the data file after an "1106", it will be treated as an exceptional condition and will be written to a separate file for later inspection.

i.e. the dataset shown prevously, assume the current row read by the program is one containing the ’ 1106 ’ message_id, and all the  previous rows are already read and stored in the memory.

The program will check all the rows stored in the memory to find a match based on the criteria (explained earlier). In this case, it finds a match in the third row, so the QTIMESTAMP_END column is updated with the Start time of the current row indicating "end of quote".

Also since the "1106" is for a BID and not an ASK, the last ASK record in the memory will also be updated with the same end time as above indicating "end of quote".

The above dataset after execution of steps explained above, would look like this



5989710, LOADBANK17,7/7/2008 5:00:24 PM,1102,Forward,GBP.USD,2008/07/07 21:00:24:610, 2008/07/07 21:00:25:109,20090709, Bid, 1000000.00, 2.0006, 2.70, 2.000570, RFQ, null, null



5989710, LOADBANK17, 7/7/2008 5:00:24 PM,1102,Forward,GBP.USD,2008/07/07 21:00:24:611, 2008/07/07 21:00:25:110, 20090709, Ask, 1000000.00, 2.0003, 2.70, 2.000570, RFQ, null, null

5989710, LOADBANK17, 7/7/2008 5:00:25 PM,1102,Forward,GBP.USD,2008/07/07 21:00:25:109, 2008/07/07 21:00:25:256, 20090709, <font color="#3366FF">Bid, 1000000.00, 2.0005, 2.70, 2.000570, RFQ, null, null

5989710, LOADBANK17, 7/7/2008 5:00:25 PM,1102,Forward,GBP.USD,2008/07/07 21:00:25:110, 2008/07/07 21:00:25:256, 20090709, Ask, 1000000.00, 2.0003, 2.70, 2.000570, RFQ, null, null

5989710, LOADBANK17, 7/7/2008 5:00:25 PM, <font color="#FF0000">1106, Forward, GBP.USD, 2008/07/07 21:00:25:256, QTIMESTAMP_END, 20090709, <font color="#3366FF">Bid, 1000000.00, 2.0005, 2.70, 2.000570 , RFQ, null, null

4.1.3 If the current message_id is "1105"

"1105" indicates "Customer nothing done".

When this message is received, no more quotes will be expected on this deal id. So the program will update the "QTIMESTAMP_END" of the all the quotes that have blank "QTIMESTAMP_END", irrespective of the provider.

4.1.4 If the current message id is "1104"

"1104" indicates "Quote withdrawn"

The program will, update the "QTIMESTAMP_END" of the all the rows that have blank "QTIMESTAMP_END in the memory matching the dealid and provider name to that of the current row, being read by the program.

The program will write these updated rows to the final file and replace the arraylist of rows with an empty arraylist still maintaining, deal id and provider information in the memory because "1104" indicates that the quote is withdrawn,and hence, more quotes can be expected from the same provider for the dealid.

4.1.5 If the current message id is "1103"

"1103" indicates "Quote denied"

When this message is received, no further quotes will be sent by the provider.

The program will, update the "QTIMESTAMP_END" of the all the rows that have blank "QTIMESTAMP_END in the memory matching the dealid and provider name to that of the current row, being read by the program.

The program will write the rows out into the final file and remove the deal id, provider name and the structure that holds their rows from the memory.

4.1.6 If the current message is "1114"

"1114" indicates "RFQ timeout for a particular provider"

When this message_id is encountered, the program checks the provider hashmap to check if any records exist for the provider in the current row


 * If records are found, i.e. if provider exists in the hash table, the application will update rows with empty end time stamp with the start time of the 1114 message. Then the program will write the provider data to the data file and remove the entry from the hashmap, just like it does for an 1103 0r 1104.
 * If no provider is found, the record will be written to the exceptions file.

4.2 If the program does not find a match

4.2.1) If the program does not find a match in the hashmap for dealid + provider_name combination and the message id on the current record is an ’ 1102 ’, that the row will be treated as a new combination i.e. the row being read is the first of its kind in the file. The program will make an entry into the hashmap(s) and the arraylist as explained in steps 1 to 4.

4.2.2) If it finds a match for the dealid but not the provider, it will pull out the hashmap that stores the provider names as keys and enter this newly encountered provider with the row in the hashmap and the arraylist

Once stored in to the memory the program will run the logic for all subsequent rows as explained in point 4.1.

However, if the message id on such a row is anything other than an ’ 1102 ’, it will be treated as an exception and will be written to the exceptions file.

Steps to pair Bid and ask pairs from multiple combinations for the same dealid.
Along with computing Quote end time, the program also tries to co-relate the Bid and the Ask of messages. The logic used to pair Bid and Ask is as explained below.

1) As explained earlier the data is sorted by dealid and timestamp. Also, the program, categorizes the rows using the Deal id and provider id. The following steps will be performed to pair Bid and Ask quotes.

2) A new column will be introduced in the array containing the row. This column will be used to store a sequence number that will be same for the paired bid and ask, correlating the two.

3) Assuming, that for a particular Dealid and provider, an "1102" bid message is read from the data file. This data row will be stored in the memory using the steps described earlier (point no 3 subsection 4.1 and 4.2)

4) Assuming that the next row is again a BID for the same Dealid and provider, it will be treated in the same way as the previous one.

5) Now, we have two bids in the memory for the same provider and deal id

.

6) Assuming the next row the program reads is an "Ask" quote for the same dealid and provider. The program along with performing steps from 4.1 and 4.2 will also check every row in the memory to find the first occurrence of a BID <font color="#3366FF">with the same dealid, provider and value date, but with a blank Quoteid column.

7) When the program finds one, it will update the Quote id column of this Bid and the ASK row which is being read currently with the current Quote id sequence number. Quoteid sequence number is an internal running sequence which starts at "1" and increments each time by 1, whenever it finds a matching correlation.

8) The correlation only takes place in a two way quote and not in a one way quote.

Note:

Since, the order in which bid and ask can occur is not guaranteed, the program will use the same steps if an "Ask" were to appears before a "Bid". The program will do a correlation check for each time a new row is read into the memory.

For e.g. below is the sequence of quotes read by the program from the file, when the file is sorted by dealid and timestamp and the quote id updated to the respective rows after the program is run.



5989710, LOADBANK17,7/7/2008 5:00:24 PM, 1102, Forward, GBP.USD, 2008/07/07 21:00:24:610, 2008/07/07 21:00:25:109,20090709, Bid, 1000000.00, 2.0006, 2.70, 2.000570, RFQ, null, null, <font color="#FF0000">1



5989710, LOADBANK17, 7/7/2008 5:00:25 PM, 1102, Forward, GBP.USD, 2008/07/07 21:00:25:109, 2008/07/07 21:00:25:256, 20090709, <font color="#3366FF">Bid, 1000000.00, 2.0005, 2.70, 2.000570, RFQ, null, null, <font color="#FF0000">2

5989710, LOADBANK17, 7/7/2008 5:00:24 PM, 1102, Forward, GBP.USD, 2008/07/07 21:00:25:112, 2008/07/07 21:00:25:110, 20090709, Ask, 1000000.00, 2.0003, 2.70, 2.000570, RFQ, null, null, <font color="#FF0000">1

5989710, LOADBANK17, 7/7/2008 5:00:25 PM,1102, Forward, GBP.USD, 2008/07/07 21:00:25:115, 2008/07/07 21:00:25:256, 20090709, Ask, 1000000.00, 2.0003, 2.70, 2.000570, RFQ, null, null, <font color="#FF0000">2

5989710, LOADBANK17, 7/7/2008 5:00:25 PM, <font color="#FF0000">1106, Forward, GBP.USD, 2008/07/07 21:00:25:256, QTIMESTAMP_END, 20090709, <font color="#3366FF">Bid, 1000000.00, 2.0005, 2.70, 2.000570 , RFQ, null, null

Steps to calculate total number of providers participating in a deal id
1) The program will use a global counter for keeping a count of the providers it encounters while reading data for a particular dealid.

2) When the program reads a row with message_id ’ 1106 ’ or an ’ 1105 ’, it writes out the data for the dealid onto the final data file. Here the program will append the last row that it writes to file with the total number of providers acting in the deal.