Every once in a while I like to make a rundown of exactly what this grid means to us and what we have done in the course of just about a year of daily work and continuing enhancements. We deal with a lot of systems, both software and hardware. We deal with some really awesome people from all over the globe and we're meeting new people every day.
It is inspiring to see just what people will come up with when given enough freedom. That has been of utmost importance to us, enabling people by doing our best to give them freedom. The freedom to upload without cost so you can try 14 combinations of textures until you find just the right one. The freedom to use a bunch of extra prims for a build because, well, it just looks better that way. The freedom to guide the direction and policies of the place you call home. The freedom to argue with us when you don't agree.
InWorldz is not a perfect platform. The people that run InWorldz are not perfect people. What we do have though is the ability to recognize our faults and improve them. Some will not be happy with the pace that improvements are implemented. Some won't be happy with our decisions on what comes first, what comes second, and so on. But above all this what is most important to us is that the changes are done right and that we remain with our residents in control of the direction of the platform.
Much of our work has been forward looking. Those who say we've done nothing or have only copied the work of others are ill informed. I've been told by a few of our residents we should post somewhere exactly what we do and what we're proud of, so here goes.
Hardware wise, our platform is professionally hosted with cisco equipment at the core of the network. Custom servers are built to our specifications for the features we need and only the latest hardware is used. Every component from the power supply to each individual hard disk has been selected for reliability and long term performance. Every server has 2 gb/sec of connectivity to the network with switches that can handle full load on every single port. Every server has redundant power supplies in case of a circuit or PS failure. The network itself is pretty amazing and has 30gb+ of incoming bandwidth from a variety or providers. Where appropriate HARDWARE RAID is used to minimize the chances of a disk fault interrupting service. The RAID cards are supplied with a battery backup to maximize reliability. The data center provides full battery backup and generators as well as some pretty cool enhancements only they have. The hosting setup is carrier grade. We have room here for infinite growth and we can get the servers we need with a very low turnaround. In fact, we have servers on standby just waiting to be plugged in to allow an easy expansion when our current hardware gets filled up. We dont just slap everyone onto a few servers and hope for the best. We partition our servers based on available hardware and load using commercially accepted virtualization techniques. In fact we're making changes to this setup based on customer feedback to remove lag from busy regions.
We take full backups of all of our databases every 6 hours. Every night there is a new cron job running that communicates with our grid management suite to provide you with your own region backup/restore. Which reminds me I have to do a little ditty on that so you know how to use it if you should ever need it.
One other thing the data center provides. A very smart and dedicated staff that understands technology. This is priceless.
Software wise is where we end up getting insulted the most. I take some of the accusations personally because saying that we've done nothing is directly calling me a liar. Those who know me best know this is the absolute wrong thing to accuse me of. Truth in the end is always the best policy. If you have to lie about something, you're doing it wrong.
Since I started working with InWorldz we have made 600 commits to our repository. That is 600 changes, enhancements, fixes, new code, new projects etc. We have branched the core server software 4 times to write in very large changes and have merged these back to our trunk. There are 7 supporting software projects that go along with the core server. Im not going to go into what each does but by name we have: aperture, zookeeper, whip, undo_backup, undo_web and two others.
We have grid management software that allows us to quickly roll out patches, update configurations, restart regions, schedule backups and a bunch of really great necessities. This software is also what will allow you to do your own restarts once we get a mysql connection pool bug patched up.
We have a custom built asset storage system that will allow us to continue to grow into the foreseeable future without having to worry about scaling anything up. You keep using your free uploads, we keep adding servers as we need them. It's that simple.
There are many that assume because we came in as outsiders to the core of opensim that we're somehow incapable or inadequate to repair, enhance, and upgrade it. We understand software. We use the appropriate tools to find bottlenecks and memory usage issues. We dont guess. We instrument, test, and fix. The people I've worked with on this project are professionals that are very passionate about this technology. Looking back over our commits we've made some major enhancements in various areas:
* Script engine enhancements. Though Im not happy with the core of engine which is why it's be rewritten, it has been patched up numerous times. When I first came aboard almost every single script I brought in would kill a region. A lot of my scripts make heavy use of timers, vector math, movement and use every trick in the book (10 scripts and link messages to support non physical movement without delay etc). I had a moment last year when I was able to import a project that is very close to my heart and have it work without killing the region. This wasn't due to anyone outside of the company's hard work. This was our hard work and it meant a lot to me.
* Physics. There were a few issues found when profiling for performance and memory usage. This was one of the key things we worked on to allow the great number of prims on a region. There are still core bugs we share with other grids but being able to fire up that test region with 45k prims on it and not have the simulator die was a great step in the right direction.
* Networking. When a bunch of people get together a lot of traffic is generated. As a long time network programmer I have always been taught that you never trust or assume anything about a network. It's unreliable and will not behave in a consistent way. Latency, packet loss, and many other factors have to be considered when designing code to process network traffic. We also have the garbage collector to think about too. Only performance measurements and tuning exposed issues here.
* Assets. Probably the most important change we've made to support growth. This system has to be fast, reliable, and most importantly scaleable. Ours is completely written from the ground up and doesn't just pull blobs out of a mysql database.
* Database. The most important system and often the most abused. Many enhancements have been made in this area to support growth including tuning queries, changing the way updates and inserts are handled, performance tuning our servers,and using transactions where appropriate to prevent "half updates".
* Security/Exploits. We encourage white hats to come and bang on our systems, give us feedback and offer cash for problems they find. This has resulted in a lot of fixes for hacked viewers and the like. Our very own Jim took it upon himself to do a complete assessment of the handling of permissions and core out a whole bunch of it. These enhancements will be available soon first on a select bunch of test regions then to the entire grid and should stop the strangeness when it comes to items being passed around.
The majority of problems we've fixed didn't show up until our systems started to come under increasing load.
Where we're headed is to completely replace more systems that are causing us trouble. The first to go is the script engine. It will be replaced by something that we have more control over. This engine will allow full saves of state including call stack and properly send them over the wire during TPs and crossings.
Next will be replacing the ODE physics engine and related work with something a bit less sensitive and more efficient.
These large projects will take time and during these times other team members will continue providing enhancements in other areas. We are also trying to get more people up to speed on the code to provide you with quicker turnaround on bugs and enhancements.
Thank you again from the bottom of my heart for sitting in on this exciting time for us. Without a great customer base and a really kick ass community this just wouldn't work and it wouldn't be worth it.