[Update: Rolling restarts are now in progress, upgrading all main grid regions to R2280.
During each upgrade, regions may be down for 10-20 minutes typically. Any downtime longer than 30 minutes should be reported in the Region Down thread here: http://inworldz.com/forums/viewtopic.php?f=16&t=2034
In the next few days, we will be upgrading the Beta grid and a very small number of developer test regions on the main grid to the new InWorldz 0.7.2 R2168
server version and performing extensive testing. If that initial limited testing goes well, the full main grid will be updated soon thereafter, with that version or a subsequent revision. However, because of the size of the changes since in the previous update, this testing may take several days (or even a week) to complete. You will see why, below.
This message is to document the changes since the previous
grid server software release (InWorldz 0.7.1 R1861). The changes fall into several major categories:
- new built-in HTTP server,
- image decoding,
- region crossings reliability,
- operational behaviors,
- scripting environment changes,
- new LSL functions or features,
- administrative/back-end improvements,
- other technical changes.New HTTP Server
The internal HTTP server is used for many features, from inventory, to transfers of images and other larger binary data, to region crossings, to viewer protocol notifications, to region-to-region communications and more. It is a key, critical part of the grid infrastructure and must be reliable. There were several significant problems with the previous HTTP server code, which have all been addressed in this update.
This new HTTP server is a very significant change, although hopefully users should only see an increase in both reliability and performance:
- updates and streamlines the HTTP/HTTPS server pipeline,
- moves XMLRPC calls to a specific path so they are handled more cleanly,
- adds support for HTTPS-in requests,
- includes configuration file support for installing the server certificates at run time,
- reworks the low-level handling of requests to better utilize threads,
- supports ordering event queue messages based on priority,
- runs HTTP-in and Event Queue calls on a thread pool.
That last one alone should significantly improve response time and decrease the number of active threads in the system.
This updated server also fixes a significant problem with the event queue allowing more than one response to be fired against the same request. This only really happened in the case of a timeout, but could result in event messages that should have been sent to the viewer being dropped due to them being appended to other responses.Image Decoding
- The decoder code has been split into a decoder and cache. OpenJPEG is no longer used by this code, instead support has been added for using CSJ2K to decode JPEG layer data. In testing so far, the CSJ2K decoder is significantly faster.
- Also updated to the most recent upstream CSJ2K from OpenMetaverse, as well as more configuration options.
- Converted bakes to use CSJ2K for the decode step.
- Added a check for empty streams on bake uploads (0-length data) since it was seen coming from a viewer and could create problems. (The reason for this viewer behavior is unknown, but may be related to the use of temp images.)Region Crossings Reliability
In addition to the crossing reliability improvements that come from the new HTTP server above, a tremendous amount of time has been dedicated to improving the reliability of region crossings (in the order of about 4 developer-months). A very large number (in the hundreds) of fixes have been applied to region crossings, scene management, sitting and standing, and other related code. Several crossing-related fixes are included in the operational changes listed below.Operational Behaviors
- Chat messages should now be able to cross region boundaries, based on the regular range limitations.
- Deleting a rezzed object now unsits any seated user(s) first.
- The "Allow Set Home to Here" group role ability now has a special exception. The parcel no longer needs to be group-owned
(deeded) land for this ability to function, just group-tagged
- Fixed the problem with the avatar Z-position override on teleports (and login) to be at least at ground level (not below ground). It was being ignored, allowing users to log in or teleport below the ground unless a large enough Z position was specified.
- Fixed the Stand Up case where vertical position is adjusted, to avoid changing the X/Y coordinates, only the Z (i.e. avoid centering on the prim) when standing.
- Reworked minimap avatar locations collections to get rid of the 60 avatar limit, and to simplify the code.
- Fixed IM messages from scripted object calls to include enough information in the packet for the user to be able to click on the highlighted name of the sender of the IM in the chat history, and see both the correct owner and the location of the object sending the IM.
- Significant performance fixes to the initial sending of prims to the viewer when a region becomes visible.
- Updated to repair any cases of a legacy problem where some parcels were tagged as group-owned, but the owner ID was not updated to match the group ID. This could cause a variety of land/group permission checks to fail.
- Removed all the UNKNOWN_AV_HEIGHT references and replaced them with less insane long-standing default values. Fixes problems where avatar height was assumed to be 127m, launching users on a stand.
- Fixed some cases of case-sensitive region name comparisons.
- Fix to deny requests to remove items from the Contents of rezzed objects that are no-mod, and also strengthened the permissions/validation checks on Contents items deletes. Fixes Mantis #1420.
- Enforced the 0.001m minimum size for prim dimensions, at load/rez time as well.
- The server now disables autopilot entirely when sending a sit response. (This was always one of the most annoying features of SL.) Now when you sit on a prim, you actually sit on a prim. You don't move to it and NOT sit. You don't bounce while the auto-pilot tries to figure out that you have reached your destination (or not). This also simplifies that code quite a bit, and may fix significant problems in error handling, e.g. cases where when autopilot was used that a user sat on a prim without losing their physics actor, causing cascade failures.
- Fixed the off-by-one-4sqm-block error on north and east when applying terrain modifications with a specific land selection rather than parcel selection (or painting land mods).
- Fixed the off-by-1m error on north and east sides when applying terrain modifications with either a specific land selection or a whole parcel selected.
- The server now normalizes the full update
rotational velocity floating point values to be the same value as the (16-bit) terse update
would encode, so that the viewer sees no difference between the values sent to it in the full and terse updates. The viewer recognizes even the smallest change as a difference and for things like the target omega rotation, and before this change, the viewer would reset it back to the starting point on a difference between full and terse updates, even if the setting has not changed
- Also replaced the older complex machine-specific terse update packing code with simpler packing. (This also changes the prim acceleration value to be 0, rather than the former 32767, which may fix some obscure problems.)
- Performance improvement: Removed a redundant user and group lookups when crossing regions or logging in.
- Fixed a security problem disabling some scripts. (Fixes Mantis #1088.)
- Fixed the server code that sends updates to ensure that root part updates were sent before child part updates. (This should fix many cases of missing prims, and all cases of prims considered "orphaned" by the viewer.)
- Avatar height is now passed through on a region crossing and preserved (eliminating one of the reasons for "bounce" and other problems).
- Self-correcting code to update the region servers to notify neighbors after 10-20 minutes that they are up, to ensure that all regions are known to neighbors in the event of some other problem. This is to remedy cases where on startup a region never gets a neighbor up notification and remains isolated, not knowing it's neighbors.
- Fixed the calculation of new region coordinates on diagonal crossings.
- Teleports, user crossings and user crossings while seated now recognize when child agent update requests time out. In that case it will now abort a region change attempt, instead of trying to continue on after it has already failed.
- The user agent now tracks the "last good" region location for emergency recovery purposes, and will use that in some cases instead of the emergency position at 128,128.
- One of the above cases is if a Stand Up operation is requested and the agent is sitting on a prim that can no longer be found in the region. In that case, the agent is stood up and the above previous location is restored. (Previously, no attempt was made to reposition the agent, and the stand up would fail, and in most cases, the null part would trigger an exception in StandUp, and this would result in Bad Things happening.)
- Fixed rezzed prim/object crossings to allow all valid region coordinates, not to exclude those below 0.1, and to report the position in any cases where the prim position was still out of range and rejected and the object returned.
- Fixed child agent update calls to be ignored for child agents still flagged as in-transit. Fixes the case of double crossings happening too quickly.
- When a prim group detects a crossing, it now walks the parts list for avatars, checking a new AvatarReady flag, refusing the position update until all avatars are aboard after a crossing.
- Stand Up no longer aborts with an error if called while in-transit from a crossing. (May have caused stuck avatars with no physics actors).
- Avatar updates now abort any automated avatar movement attempts, if the agent is sits on a prim.
- Fix some cases of automated recovery of the avatar when there is a timeout crossing while seated.
- Eliminated several cases where redundant updates were sent to the viewer.
- Removed a redundant call to send a DisableSimulator packet from region shutdowns, and in the case where a single user is closing their region session (e.g. normal quit). Fixes the missing graceful shutdown "The region is going down" message, which should now be visible rather than looking like an abortive disconnect.Script Environment Changes
is no longer limited to 10m when moving an object.
- Fixed the llGiveMoney
definition. It was marked as returning void, but it returns an integer, which in some scripts was causing a stack error in Phlox if called.
- Fixed sounds
so that only the owner of an attached HUD can hear sounds when playing
an attached sound. (Triggering
sounds is different.)
- Fixed the server to send down actual values for PAY_DEFAULT
on buttons. Replaced the one-time use of PAY_HIDE
constant with the ones in object itself. Note the defaults for InWorldz are higher than in SL, due to exchange rate differences and a chance to pick new ones; the defaults for now will be 20, 50, 100, 200.
- Fixed a rezzed object's SoundGain and SoundRadius to be floats as per LSL specs.
- Fixed SoundRadius to not be reset to 20m on a llLoopSound
call, and to be set to 0m on any llStopSound
call. (This may fix missing sounds if a function other than llLoopSound
is used after llStopSound
- Fixed llSetSoundRadius
to send an object update (since the sound radius is one if the update fields).
- Generation-related "grey goo fence" restrictions have been enabled.
- Fixed llSameGroup
to support objects
as the target as well as avatars
- Fixed iwTeleportAgent
, and llEjectFromLand
to be asynchonous calls that do not block the script engine in the region. Fixes Mantis #1354.
- Fixed llAvatarOnSitTarget
to not return a null ID for some sit targets with position X/Y/Z of zero.
- Fixed llUnSit
to work in the case of the partially-zero sit targets above.
- Fixed spelling problems with llGroundContour
and other problems which prevented it's function entirely. (Note that until ground level extrapolation is added, these function are of limited use.)
- Added support to throttle processing of LSL llHttpRequest
calls to 25 requests/sec.
- Later, also made llHttpRequest
throttling tuneable from the console.
- Also changed the default timeout for llHttpRequests
to 60 seconds to match SL.
- Also applied changes to llHttpRequest
processing to minimize the chance that a single process fills the queue and the other requests start seeing timeouts because they are blocked from proceeding.
- Fixed a null dereference exception if a DeGrab packet (touch_end
event trigger) comes in after a prim has left the region.
- Fixed the closing of child agents to not affect the list of regions neighboring that region containing the former main agent.LSL Scripting Language Enhancements
- Added: the missing OBJECT_
constants (values 9-16) to be recognized at compile time.
- Added: the missing TOUCH_INVALID_FACE
constant. Fixes Mantis #1029.
- Added: support for PRIM_SLICE
- Added: integer iwActiveGroup
(key agent, key group);
- Added: key llAvatarOnLinkSitTarget
- Added: key iwAvatarOnLink
(integer linknumber), which will return a seated avatar even if there is no sit target.
- Added: key iwGetLastOwner
- Added: iwRemoveLinkInventory
(integer linknumber, string item)
- Added: iwGiveLinkInventory
(integer linknumber, key destination, string inventory)
- Added: iwGiveLinkInventoryList
(integer linknumber, key target, string folder, list inventory)
- Added: key iwGetLinkNumberOfNotecardLines
(integer linknumber, string name);
- Added: key iwGetLinkNotecardLine
(integer linknumber, string name, integer line);
- Added: key iwGetNotecardSegment
(string name, integer line, integer startOffset, integer maxLength);
- Added: key iwGetLinkNotecardSegment
(integer linknumber, string name, integer line, integer startOffset, integer maxLength);
- Added: iwTeleportAgent
(key agent, string region, vector pos, vector lookat)
- Added: llRegionSayTo
(string destId, integer channel, string msg)
- Added: integer llGetUsedMemory
- Fixed an exception in llRequestSimulatorData
() affecting all calls to it.
- Implemented llRequestSimulatorData
) as returning the current version number string for the current region (only). If a script requests DATA_SIM_RELEASE
for a region other than the current region, this function simply returns "InWorldz".Administrative/Back-End Improvements
- Classified updates no longer specify the expiration date in the UPDATE case (see Elenia).
- Cassandra inventory server management now tolerate reading partially or mostly deleted inventory folders and skips them when sending the skeleton.
- The provisioning tool now looks up the owner and estate and includes additional estate settings are now included in the region information and the corresponding help text. Fixed a non-existent owner to mean a zero UUID and not a blank or null string.
- Many new warning and debug console messages, adjustments of message severity, console message reporting cleanup.
- Added a "debug crossings 0|1" console command to turn on the extra region crossings debug messages.
- The provisioning tool will now include tweaks to the TCP configuration in a Windows Registry for server regions (setting MaxUserPort and TcpTimedWaitDelay).
- Added console log messages for the cases where the UDP server gets a packet from an unrecognized client, or from an unconnected client.
- Fixed a crash caused by the complete lack of error handling in the "alert" console command handler (e.g. if insufficient parameters were specified). For example, if you just typed "alert" or "a" on the console.Other Technical Changes
The following are just a sampling of some of the additional highly-technical changes made since R1861. Many of these are implementation details but for the serious techno-geeks, they are included as they may provide a bit of insight:
- Several checks to see if an object has been deleted have been added to the functions that send part updates, to avoid processing of updates after the part has been removed from the scene.
- Replaced the mechanism used to determine when all avatars have crossed when seated on prims crossing regions. Prims will now include an additional crossing parameter which indicates the number of avatars to wait for on that prim. Fixes cases where a scripted prim then attempts to exit the new region before the avatars have entered, e.g. a corner crossing into a third region through an intermediary.
- Added a last-moment attempt to not send prim updates for objects that are in transit.
- Made registration with the Aperture (texture) server asynchronous so as to not block crossings for the duration of the server request (or timeout). (Should speed up crossings.)
- Prim crossings will now assign new prim inventory item IDs and remap the old ones to the new ones on receiving of a prim with an inventory. This is akin to what happens to task inventory items on a new rez. This should address the scripts stopping when reentering a region you just exited.
- Many many fixes to ensure data is accessed only while locked or otherwise in a thread-safe manner. Also many cases of former locks where the scope of the lock has been decreased, or even eliminated.
- The code that sends root part updates separate from child part updates now send the root part from within the same parts lock as the child parts. (May fix problems seen while the object is in transit.)
- Significant change, mostly for crossings and other changes to make agent position updates more atomic. Agent position info now available only indirectly through a position info class which gets or updates the position info in a single thread-safe manner. This required many code changes to ensure all position info changes were applied at once, from within the lock, and could not be fetched in an inconsistent state. Position info also includes parent ID.
- Additional validity testing on operations to prevent late operations on child agents in regions from viewers who think they are still full agents.
- Improved error handling and recovery for the above.
- An AgentUpdate is now ignored when the agent is sitting on a prim and the prim is in transit during a crossing.
- Rearranged Stand Up conditional returns to do so only after releasing any controls and setting the avatar on the sit target back to a zero UUID. This closes problems where prims with child agents might still believe they have an avatar on the sit target.
- Stand Up is now refused for prims in-transit on a crossing.
- Stand Up now reports cases where the agent is supposed to be sitting but the prim cannot be found.
- Stand Up now allows an object part to be passed to it for the case where a specific part needs an unsit (e.g. the case where the object is being deleted). Closes timing windows and improves error handling.
- When a user in transit completes the crossing, if it's a child agent, the parentID (seated prim) is cleared. This avoids timing and error recovery problems.
- Restoring a user in the current scene now operates on the basis of whether the agent is a child agent or seated on a prim. Also fixed this recovery on failed transfers to actually add the physics actor only when a full agent and NOT sitting.
- Reworked the data structures and locking to avoid the deadlock on cleanup observed on the beta grid for http-in requests.
- Added an exception handler around LLUDPServer's RemoveClient method, so as to capture and report any problems removing viewer clients.
- Changed the code that sends prim updates to avoid null exceptions when another thread clears the updates list while it is being traversed by this code. May fix some cases of missing/partial prim updates.
- Added "DEFAULT_EYES" to the list of default wearables with appropriate UUIDS. Enabled inventory links with new code that doesn't allow links to default items. They don't exist in inventory in the same way regular wearables do. This change is enough to allow some viewers to log in but isn't enough yet to make them "supported". This is just enabling a new message type that is also used by some V1 viewers.
- Replaced the calls that convert numeric text strings to decimal types with calls to convert to doubles, since we actually wanted to convert the string to a float, and the decimal conversion can trigger an exception given a float with a format like "-3.51148E-06". Fixes some problems serializing and deserializing the corresponding data from storage.
- Impoved error handling and reporting for NotifyDataServices to avoid null dereference exceptions seen on development test servers.
- Fixed a null reference exception in the Vivox ParcelVoiceInfoRequest if the current parcel is out of bounds (e.g. 300,300).
- Vivox voice requests now recognize avatars in transit, failing the request. This fixes null exceptions when an avatar on a part moves into a new region, and may fix other timing-dependent problems with avatars crossing or teleporting.
- The region server will now ignore requests to change the prim group position while the prim is in transit on a crossing.
- Closed several small timing windows in Stand Up in terms of avatar position being assigned incorrect or invalid values.
- Prim group and avatar "in transit" status now supports nested calls, to avoid premature clearing of the former flag.
- Removed the recently-added full update for the case where Stand Up is called and the agent does not appear to be sitting. After a successful crossing that the sending side believes failed (timeout), this tended to "yank the agent back" into the original sim, so that the viewer would then communicate with the wrong region.
- When resending ACK packets, the server now reports the "[LLUDPSERVER]: Ack timeout, disconnecting" error only once, since it attempts to disconnect after that (error is fatal). Also fixed to check null references before disconnect that caused exceptions in testing of Ack timeouts.
- The agent/avatar absolute position now ignores the parent position, in the position returned, if the parent ID is not set. (Fixes position-related logic problems.)
- All updates to user positions, parent position, rotation, velocity, etc. are now rationalized through a single new function that updates the values consistently, inside a scene lock. This fixes concurrency/timing problems on crossings and other position updates.
- Avatar parent ID and parent position are now cleared for safety when downgrading a user agent from root agent to child agent (i.e. when the avatar is leaving the region). (Very important crossings bug.)
- Many changes to scene, prim and avatar locking when a prim group or avatar crosses regions, sits or stands.
- Many, many other avatar position-related fixes.
- Disabled agent physics movement collection while the avatar is in transit changing regions. (This may have been a very significant crossings bug.)
- The function that checks if an avatar has reached a sit target destination now refuses to set the target position (making position info inconsistent) if the agent is already sitting.
- Stand Up now locks the scene, works with local variable copies of the position info, then updates the position info at once using SetAgentPositionInfo. It no longer updates the avatar position early, which may have been a nasty bug since the parent position was still set (letting a terse update send double-sized position info).
- Fixed the movement flag to handle the seated agent case, sending one avatar update and disabling the movement, rather than sending a continuous stream (flood) of updates while seated in some cases.
- The server now locks the scene while collecting the position info for a full update, then sends the update from outside the lock. This may fix timing problems with position updates.
- The server now also uses the same approach when sending the initial appearance or updated appearances for avatars, whereas they previously locked the scene during the entire send process.
- Child agent updates now assign the new position using safely, rather than accessing the position data directly, which among other problems would not have cleared the parent position or paused any velocity effects. (Failing to clear parent position may have been a significant crossing bug.)
- Child agent updates while seated now skip the position update. (This may have been a significant crossing bug.)
- Child agent updates now abort if the avatar is in transit on a prim.
- The server now restricts sending updates for other user's HUDs to the wrong agents. (This is a reimplementation of Justin's change for OS. This may resolve some "ghost HUD attachment" cases, although neither fix addresses the root issue.)
- Added a child agent check to agent update handling, displaying a console warning and aborting the operation for child agents. (This is a fix for late processing of requests after crossing into a new region.)
- Fixed several cases of objects indexed by UUID instead of local ID. This is to support the case where a prim reenters a region faster than the async deferred updates go out to the viewer. In that case, there's still a reference to the old prim with the old local ID, and we don't want its presence to interfere with updates to the new scene object.
- Added support (but disabled for now) for adding distant regions based on viewer draw distance, rather than just the nearest neighbor regions. This is now working but disabled until such time as we have proper region layout visibility rules (e.g. not being able to see diagonal regions in a checkerboard pattern).