Great collection of ideas.
My situation is rather unique. The application is pre production -
means it prepares content
for a production platform. The problem is that it is slow. You would
say - so what, this is pre production.
The issue is that it's done at least once a day sometimes 2-3 time.
Every final build MUST go through QA. If they have to wait several
hours to get to QA something that
may need to be put to production quickly - this is no good. It delays
the production update by a day.
Our people work late but not 24 hours.....:D.
On the other hand, since it's pre production, to assemble a 16x64GB
cluster is un reasonable. Any other
solution with multi machines will also yield twisted faces. It's not
that we can't afford machines. We make
them. It just create a vision that you are using a Rolls Royce where a
bicycle should do the job. You know the
fellows upstairs.
Since I decided to drop the caching solution, I turned back to
architecture. I was so glad to realize that the
I have done so because I found the following:
The assembly of the deliverable start at one point when a timer wakes
up and find that there is work to do.
From this point the code, while processing the required, collect a
growing cluster of objects. At the end point
all the components are put in one object, serialize and shipped to the
appropriate server.
At this point all the object that were used are free to be GC'ed.
In this case why not object pooling? I have the starting point AND the
end point!.
The problem with object pooling is that one needs to run around the
objects and figure out when can they be
released back to the pool. Having one point in which ALL the objects
are freed to be GCed - they will be
put back to the various pools by hand. Simple!
With the above I save tons of "new ObjectXXX" AND save a lot of GC.
Can it get better then that?
On top I realize that I can consolidate DB tables AND take a wider DB
queries for code down the line.
A bigger ResultSet beats multiple DB access.
On top I realize that ton of code segments are synchronize very
grossly such that it makes the whole
application running effectively as single thread or sequentially.
My work is cut out for me but I'm sure I will gain tons of speed.
Thanks for all replays. Great ideas for future projects.
-Michael
On Sep 23, 6:45 am, ligerdave <david.c...@gmail.com> wrote:
> what about map-reduce? chop the job to much smaller sub-jobs and then
> have someone(another server or PC) else to process.
> i assume your app probably loads data from a DB, and wont load the 1TB
> data and then process it entirely.
>
> doing something kinda like pagination? fetch a chunk and have someone
> else to do the calculation? then reduce, combine the results and map
> again if necessary?
>
> as for the 1mb limit, convert the object to byte(serialization) and
> split it to pieces(can come up w/ a data structure)? still much
> quicker than writing to disk.
> accessing memory in general is 250,000 times quicker than doing the
> same against the disk.
>
> On Sep 22, 10:33 am, "MikeG." <miki...@gmail.com> wrote:
>
>
>
> > Thanks,
>
> > Sounds good. Unfortunately my situation is unique.
> > I'm trying to expedite a "meta production" application. This is an
> > application that
> > create content for a production application. The creation is slow
> > because there is
> > a lot of big records that need to be updated and fetched.
>
> > Looks like I can't avoid a cluster cache because I need 1TB and
> > growing DB.
> > This means at least 16 x 64GB machines. There is no justification to
> > have such cluster
> > for non-production engine.
>
> > I'm back to look at the architecture to get some more speed.
>
> > On Sep 22, 6:12 am, ligerdave <david.c...@gmail.com> wrote:
>
> > > MongoDB is actually "cached" db, meaning that, most of its records are
> > > in memory.
>
> > > I think there is also a memcached and DB hybrid which comes w/ a
> > > persistent option. i think it's called memcachedDB, which runs a in-
> > > memory db(like mongodb). this shares most of common api w/ memcached
> > > so you dont have to change code very much
>
> > > On Sep 21, 2:11 am, Joseph Engo <dev.toas...@gmail.com> wrote:
>
> > > > Yes, MongoDB is a database.
>
> > > > You mentioned blobs so I figured you were dealing with binary objects.
>
> > > > The problem with your use case is objects larger then 1MB. Thats really not
> > > > what Memcached is intended to be used for. You could split the objects into
> > > > smaller chunks. But keep in mind, no object is guaranteed to be in the
> > > > cache. So if 1 chunk of your large object is missing you need to pull the
> > > > entire object back from your datastore and split it back into chunks again.
> > > > Sure, its very rare that it would happen but it does.
>
> > > > What happens if you need to reboot your server for patches or you have a
> > > > crash. You will need to warm up the cache and that could take a while
> > > > depending on how you assemble the objects.
>
> > > > It really sounds to be that you should do optimizations on your DB layer
> > > > first before just throwing a cache at it. :)
>
> > > > On Mon, Sep 20, 2010 at 10:54 PM, MikeG. <miki...@gmail.com> wrote:
> > > > > First, thanks for taking the time.
>
> > > > > Now, MongoDB is a cache engine? I think it's a DB?
>
> > > > > I need a cache engine that can handle (some) objects bigger then 1MB.
> > > > > I don't have binary data. All I have is ascii elements. My final
> > > > > product is an aggregation
> > > > > of meta data of many binary code pieces.
>
> > > > > Just for context - my company has many products while each may have
> > > > > several hardware
> > > > > versions. Each version has it's own binary image.
> > > > > If you have one of our products and you need to upgrade the software
> > > > > you get the binary from on
> > > > > place (I have nothing to do with it) and a build meta data from
> > > > > another place. It's a scheme to have
> > > > > only the legitimate users able to upgrade.
>
> > > > > My application builds or assembles all the various products builds
> > > > > meta information and download it
> > > > > to the distribution cluster.
>
> > > > > Currently my application takes forever because everything needs to be
> > > > > read from the DB.
> > > > > Caching it will do wonders. I'm sure about that. Profiling my
> > > > > application I found that the total time
> > > > > spent on DB reading is hugh.
>
> > > > > Do you see any problem using memcached?
>
> > > > > Thanks,
> > > > > -Michael
>
> > > > > On Sep 20, 10:18 pm, Joseph Engo <dev.toas...@gmail.com> wrote:
> > > > > > If you need larger than 1MB objects and you are only serving these
> > > > > objects
> > > > > > to 5 users at a time. It really sounds like memcache is the wrong tool
> > > > > for
> > > > > > your project. You might want to look into something like MongoDB which
> > > > > has
> > > > > > a larger object limit of 4MB. There are a number of key value stores
> > > > > that
> > > > > > can handle even larger size objects.
>
> > > > > > Could you explain a little more on what type of binary data you
> > > > > > are manipulating ?
>
> > > > > > On Mon, Sep 20, 2010 at 8:25 PM, MikeG. <miki...@gmail.com> wrote:
> > > > > > > Hi,
>
> > > > > > > I'm starting a project in which I would like to have the entire DB in
> > > > > > > cache.
> > > > > > > The reason is that my transactions are reading large amount of data
> > > > > > > from the DB to generate
> > > > > > > a deliverable blob.
>
> > > > > > > I have no concurrency issues. I will always have around 5 users max
> > > > > > > concurrently and most
> > > > > > > of the time it will be single user. SO I look at the context as
> > > > > > > virtually single user.
>
> > > > > > > To avoid this massive DB reading (of large count of large chunks) I
> > > > > > > would like to have it
> > > > > > > permanently in memory.
> > > > > > > When any modification of a record happened it is not being written to
> > > > > > > the DB until the
> > > > > > > big blob final product is delivered and the local system goes idle.
> > > > > > > Only then modifications are written to the DB. Upon successful
> > > > > > > completion of DB update the local system sends a
> > > > > > > message to the recipient of the product to inform it that the DB is
> > > > > > > now in sync with the product
> > > > > > > at hand and it can be consumed.
>
> > > > > > > Now, to hold 1TB in memory I need a cluster and not a small one. I
> > > > > > > have decided to use
> > > > > > > memory mapped files such that my RAM is virtual memory. Easy to get
> > > > > > > large file system of
> > > > > > > several TBs.
>
> > > > > > > My question - is there any limit memcached has as far as cache size?
>
> > > > > > > Also, memcached (the C implementation) has a 1MB record size limit. 1)
> > > > > > > What's the reason
> > > > > > > for that? 2) Can it be changed (with a hacked private version) 3) Does
> > > > > > > Jmemcached has the
> > > > > > > same limit?
>
> > > > > > > Thanks,
> > > > > > > -Michael