[Honeysnap] Re: Importing flows into databases

Arthur Clune arthur at clune.org
Mon Jun 25 06:15:44 EDT 2007



[discussion moved to Honeysnap mailing list]

On 25 Jun 2007, at 10:52, Jamie Riden wrote:

> (I did a quick svn update and re-import and it didn't seem to add the
> new indices - probably a failure to rtfm on my part, but I only had a
> few minutes to test.)

Seems to be working for me:

$ hs-shell --debug --dburi='postgres:///hs'

(or try

$ honeysnap --debug <rest of params>

I just wanted to look at creation not do an import)

2007-06-25 10:59:47,438 INFO sqlalchemy.engine.base.Engine.0x..10 None
2007-06-25 10:59:47,452 INFO sqlalchemy.engine.base.Engine.0x..10 COMMIT
2007-06-25 10:59:47,453 INFO sqlalchemy.engine.base.Engine.0x..10  
CREATE UNIQUE INDEX flowindex1 ON flow (starttime, src_id, dst_id,  
sport, dport)
2007-06-25 10:59:47,454 INFO sqlalchemy.engine.base.Engine.0x..10 None
2007-06-25 10:59:47,458 INFO sqlalchemy.engine.base.Engine.0x..10 COMMIT
2007-06-25 10:59:47,459 INFO sqlalchemy.engine.base.Engine.0x..10  
CREATE INDEX flowindex2 ON flow (lastseen, src_id, dst_id, sport, dport)
2007-06-25 10:59:47,459 INFO sqlalchemy.engine.base.Engine.0x..10 None
2007-06-25 10:59:47,462 INFO sqlalchemy.engine.base.Engine.0x..10 COMMIT
2007-06-25 10:59:47,466 INFO sqlalchemy.engine.base.Engine.0x..10

I'm not sure if it'll add the indexes to an existing database though  
- I did a dropdb hs && createdb hs

> can you check if the new indexes are actually there?
>
> pgsql -d <dbname>
> <dname># \d flow
>
> should tell you about all the indexes on 'flow'.


biber:~ postgres81$ psql81 -d hs
Welcome to psql81 8.1.8, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
        \h for help with SQL commands
        \? for help with psql commands
        \g or terminate with semicolon to execute query
        \q to quit

hs=# \d flow
                                       Table "public.flow"
    Column    |            Type             |                      
Modifiers
-------------+----------------------------- 
+---------------------------------------------------
id          | integer                     | not null default nextval 
('flow_id_seq'::regclass)
honeypot_id | integer                     | not null
ip_proto    | integer                     | not null
src_id      | integer                     | not null
dst_id      | integer                     | not null
sport       | integer                     | not null
dport       | integer                     | not null
packets     | integer                     | not null
bytes       | integer                     | not null
starttime   | timestamp without time zone | not null
lastseen    | timestamp without time zone | not null
filename    | character varying(1024)     | not null
Indexes:
     "flow_pkey" PRIMARY KEY, btree (id)
     "flowindex1" UNIQUE, btree (starttime, src_id, dst_id, sport,  
dport)
     "flowindex2" btree (lastseen, src_id, dst_id, sport, dport)
Foreign-key constraints:
     "flow_dst_id_fkey" FOREIGN KEY (dst_id) REFERENCES ip(id)
     "flow_honeypot_id_fkey" FOREIGN KEY (honeypot_id) REFERENCES  
honeypot(id)
     "flow_src_id_fkey" FOREIGN KEY (src_id) REFERENCES ip(id)

hs=#

This does make me realise that we should have proto in that index as  
well, and in the flowIdentify code as well :( It's also going to hit  
the bulk imports as we'll get flash clashes on flows, not to mention  
just being wrong (though only sometimes as we correctly look for  
proto in the local per-file cache)

Code tweaking now.

> It should be possible to get the data import time to scale almost
> linearly with quantity of data - that's what databases are for after
> all.

That is definitely what we want. 20 hours for 128gb would be very  
good from where we are now, but it's a long way from there to 2 hours  
like argus. It would make hs useable though!

Arthur


More information about the Honeysnap mailing list