'기술 관련 글' 카테고리의 글 목록

'기술 관련 글'에 해당되는 글 16건

2020.07.01 ‘친환경 수소의 가치와 미래’
2020.03.09 Cherry MX Board 2.0 한글키로 한/영전환 안되는 문제
2018.05.21 소스코드를 이미지로 공유
2016.08.09 Windows git-diff에서 CR(Carriage Return, ^M)을 무시하도록 하는 방법
2016.06.14 XLIFF (XML Localisation Interchange File Format)
2013.12.17 StarUML Bug - Cannot focus a disabled or invisible window
2012.04.07 Visual Studio Achievements "Bring Some Game To Your Code!"
2012.03.31 C++ Books
2011.10.01 Introduction to OpenCL
2011.09.02 RAID Level 설명

Cherry MX Board 2.0 한글키로 한/영전환 안되는 문제

Registry에서 i8042prt/Parameters를 검색하고, 아래 key 들을 수정해 주면 됨

(KEY - VALUE)

LayerDriver KOR - kbd101a.dll

OverrideKeyboardIdentifier - PCAT_101AKEY

OverrideKeyboardSubtype - 3

'기술 관련 글' 카테고리의 다른 글

‘친환경 수소의 가치와 미래’ (0)	2020.07.01
소스코드를 이미지로 공유 (0)	2018.05.21
Windows git-diff에서 CR(Carriage Return, ^M)을 무시하도록 하는 방법 (0)	2016.08.09
XLIFF (XML Localisation Interchange File Format) (0)	2016.06.14
StarUML Bug - Cannot focus a disabled or invisible window (0)	2013.12.17

Posted by 세월의돌

Windows git-diff에서 CR(Carriage Return, ^M)을 무시하도록 하는 방법

아무래도 개인적인 취향으로 Windows에서 개발하는게 너무 편하다고 느끼고 있다.

그래서 Windows에서도 git을 사용하게 되는데, git-diff를 하게 되면 ^M이 골칫거리가 될 수 있다.

물론 core.autocrlf를 true로 설정하면, checkout 할 때는 Windows style(CRLF)로, push할 때는 UNIX style(LF)로 처리해 주는 옵션이 있는데, 기존의 repository를 이용하거나 기타 상황에서는 불편한 경우가 생기기도 한다.

그래서 찾아봤더니 core.whitespace라는 옵션이 있고, 값에 cr-at-eol을 추가 해 주면 ^M으로 표시되는 CR이 git-diff에서 무시된다.

git config에 대한 help 페이지를 보면, 아래와 같이 설명이 되어 있다.

core.autocrlf

Setting this variable to "true" is almost the same as setting the text attribute to "auto" on all files except that text files are not guaranteed to be normalized: files that contain CRLF in the repository will not be touched. Use this setting if you want to have CRLF line endings in your working directory even though the repository does not have normalized line endings. This variable can be set to input, in which case no output conversion is performed.

Console에서 아래와 같이 입력하면 설정이 가능하다.

git config --global core.whitespace cr-at-eol

자세한 설명은 google로 검색했던 다음 사이트를 참고하면 된다.

https://lostechies.com/keithdahlby/2011/04/06/windows-git-tip-hide-carriage-return-in-diff/

저작자표시 비영리 동일조건 (새창열림)

'기술 관련 글' 카테고리의 다른 글

Cherry MX Board 2.0 한글키로 한/영전환 안되는 문제 (0)	2020.03.09
소스코드를 이미지로 공유 (0)	2018.05.21
XLIFF (XML Localisation Interchange File Format) (0)	2016.06.14
StarUML Bug - Cannot focus a disabled or invisible window (0)	2013.12.17
Visual Studio Achievements "Bring Some Game To Your Code!" (0)	2012.04.07

Posted by 세월의돌

XLIFF (XML Localisation Interchange File Format)

https://en.wikipedia.org/wiki/XLIFF

https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=xliff

오호, 이런 것도 있었군. 나중에 활용할 일이 있겠지.

모르면 찾아보고 물어봐야 함-_-;

(그런데, 뭔가 좀 알아야 물어보지ㅎㅎ)

저작자표시 비영리 동일조건 (새창열림)

'기술 관련 글' 카테고리의 다른 글

소스코드를 이미지로 공유 (0)	2018.05.21
Windows git-diff에서 CR(Carriage Return, ^M)을 무시하도록 하는 방법 (0)	2016.08.09
StarUML Bug - Cannot focus a disabled or invisible window (0)	2013.12.17
Visual Studio Achievements "Bring Some Game To Your Code!" (0)	2012.04.07
C++ Books (0)	2012.03.31

Posted by 세월의돌

StarUML Bug - Cannot focus a disabled or invisible window

소스를 분석하기 위해, 주로 sequence diagram을 잘 사용하고 있었는데,

회사 데스크탑이 문제가 좀 있는 것 같아 포맷을 하면서 StarUML의 새 버전(?)을 설치 했다.

새로이 설치한 버전 number는 5.0.2.1570인데, 그전에 사용하던 버전이 뭔지는 확실치 않다.

그래서, 새로운 버전에서 새로 생긴 문제인지는 확실하지 않다.

아무튼, 설치하고 처음 실행 했을 때는 별 문제가 없었는데, 어느 순간부터 실행을 하면 자꾸

"Cannot focus a disabled or invisible window"

이라는 error popup이 발생하면서 사용하기 힘든 상황이 되었다.

그래서 검색을 해보니, 버그가 있단다.

Model Explorer를 자동숨김으로 설정해 두면, 저런 문제가 발생한다는 군.

Model Explorer를 항상 보이도록 변경하니, 해당 오류는 없어졌다.

화면을 넓게 사용하고 싶어도, 그럴 수 없겠구나.

언젠가 수정이 되겠지ㅋ

저작자표시 비영리 동일조건 (새창열림)

'기술 관련 글' 카테고리의 다른 글

Windows git-diff에서 CR(Carriage Return, ^M)을 무시하도록 하는 방법 (0)	2016.08.09
XLIFF (XML Localisation Interchange File Format) (0)	2016.06.14
Visual Studio Achievements "Bring Some Game To Your Code!" (0)	2012.04.07
C++ Books (0)	2012.03.31
Introduction to OpenCL (0)	2011.10.01

Posted by 세월의돌

Visual Studio Achievements "Bring Some Game To Your Code!"

아하하하하하. 정말 재미있는 세상이다!

Visual Studio Achievements라는걸 처음 알게 되었다.

이게 무엇인고 허니, Visual Studio를 이용해 코딩을 하면서, 특정 event/mission 등을 수행 했을 때 badge를 획득하는 시스템.

XBOX Live 게임들의 Achievements 시스템을 Visual Studio로 가져온 듯 하다.
(멋진 MS 같으니라구!)

정신없이 코딩 하다가 저런게 하나 띵~! 하고 뜨면 정말 재미있겠다. : )

카테고리도 무려 6개나 되는구나! 실제 아이템은 수십가지가 되는듯...


Customizing Visual Studio	Don't Try This At Home	Good Housekeeping

Just For Fun	Power Coder	Unleashing Visual Studio

지금으로선 절대 사용 할 일이 없는 상황 orz (상황을 만들어야 겠지?!)

저작자표시 비영리 동일조건 (새창열림)

'기술 관련 글' 카테고리의 다른 글

XLIFF (XML Localisation Interchange File Format) (0)	2016.06.14
StarUML Bug - Cannot focus a disabled or invisible window (0)	2013.12.17
C++ Books (0)	2012.03.31
Introduction to OpenCL (0)	2011.10.01
RAID Level 설명 (0)	2011.09.02

Posted by 세월의돌

C++ Books

공부하고 싶고, 그래야만 하는 책들 이라고 생각하여 정리함... orz
(그런데 절판된 책들이 왜 이리 많은거냐 -_-)

Effective C++ - 이펙티브 C++ (3판): 국내도서>컴퓨터/인터넷; 저자 : 스콧 마이어스(Scott Meyers) / 곽용재역; 출판 : 피어슨에듀케이션 2006.05.25

상세보기

More Effective C++: 국내도서>컴퓨터/인터넷; 저자 : SCOTT MEYERS / 곽용재역; 출판 : 정보문화사 2007.08.27

상세보기

EXCEPTIONAL C++ (C++ 프로그래머를 자극하는 47개의 재미있는 퍼즐문제): 국내도서>컴퓨터/인터넷; 저자 : 허브서터 / 김동혁역; 출판 : 인포북 2003.03.24

상세보기

More Exceptional C++ - 이 정도는 알아야 C++ 프로그래머: 국내도서>컴퓨터/인터넷; 저자 : 허브셔터 / 황은진역; 출판 : 사이텍미디어 2003.06.02

상세보기

Exceptional C++ Style - 40개의 새로운 프로그래밍 퍼즐과 문제 그리고 그 해답: 국내도서>컴퓨터/인터넷; 저자 : Herb Sutter / 류광역; 출판 : 정보문화사 2005.04.26

상세보기

GOF의 디자인 패턴: 국내도서>전공도서/대학교재; 저자 : 에릭감마 / 김정아역; 출판 : 피어슨에듀케이션(교재) 2007.05.09

상세보기

GoF 디자인 패턴! 이렇게 활용한다: 국내도서>컴퓨터/인터넷; 저자 : 장세찬; 출판 : 한빛미디어 2004.05.28

상세보기

C++ 템플릿 가이드: 국내도서>컴퓨터/인터넷; 저자 : 데이비드 반데보드,니콜라이 M. 조슈티스 / 한정애역; 출판 : 에이콘출판사 2008.12.16

상세보기

Efficient C++ : 프로그램 성능을 끌어올리는 핵심 테크닉: 국내도서>컴퓨터/인터넷; 저자 : David Mayhew,Dov Bulka / 배재현역; 출판 : 인포북 2004.05.14

상세보기

이펙티브 STL - Effective STL: 국내도서>컴퓨터/인터넷; 저자 : Scott Meyers / 곽용재역; 출판 : 정보문화사 2006.03.29

상세보기

C++ in Depth 세트 (5권): 국내도서>컴퓨터/인터넷; 저자 : 앤드류쾨니히 / 곽용재역; 출판 : 정보문화사 2004.02.10

상세보기

제네릭 프로그래밍과 디자인 패턴을 적용한 Modern C++ Design - 모던 C++ 디자인: 국내도서>컴퓨터/인터넷; 저자 : Andrei Alexandrescuf / 이기형역; 출판 : 인포북 2003.07.30

상세보기

끝으로...

Last revised
January 6, 2012
364 pages
(View a free sample)

Presentation Materials: Overview of the New C++ (C++11)

by Scott Meyers
Single-user license (personal use only)

This PDF document contains the presentation materials from Scott Meyers' three-day training course on C++11, the latest version of C++. This intensively technical seminar introduces the most important new features in C++11 and explains how to get the most out of them.

In Stock.
The PDF is ready for immediate download.

저작자표시 (새창열림)

'기술 관련 글' 카테고리의 다른 글

StarUML Bug - Cannot focus a disabled or invisible window (0)	2013.12.17
Visual Studio Achievements "Bring Some Game To Your Code!" (0)	2012.04.07
Introduction to OpenCL (0)	2011.10.01
RAID Level 설명 (0)	2011.09.02
감독 대 위원회, 애플 대 구글 (0)	2011.08.22

Posted by 세월의돌

Introduction to OpenCL

[원본 출처:

http://www.realworldtech.com/]

By: David Kanter | 12-07-2010

Introduction to OpenCL

Using a GPU for computational workloads is not a new concept. The first work in this area dates back to academic research in 2003, but it took the advent of unified shaders in the DX10 generation for GPU computing to be a plausible future. Around that time, Nvidia and ATI began releasing proprietary compute APIs for their graphics processors, and a number of companies were working on tools to leverage GPUs and other alternative architectures. The landscape back then was incredibly fragmented and almost every option required a proprietary solution ? either software, hardware or both. Some of the engineers at Apple looked at the situation and decided that GPU computing had potential ? but they wanted a standard API that would let them write code and run on many different hardware platforms. It was clear that Microsoft would eventually create one for Windows (ultimately DirectCompute), but what about Linux, and OS X? Thus an internal project was born, that would eventually become OpenCL.

The goals for OpenCL are deceptively simple: a cross-platform API and ecosystem for applications to take advantage of heterogeneous computing resources for parallel applications. The name also makes it clear ? that OpenCL is the compute analogue of OpenGL and is intended to fill a similar role. While GPUs were explicitly targeted, a number of other devices have considerable potential, but lack a suitable programming model, including IBM’s Cell processor and various FPGAs. Multi-core CPUs are also candidates for OpenCL, especially given the difficultly inherent in parallel programming models, with the added benefit of integration with other devices.

OpenCL has a broad and inclusive approach to parallelism, both in software and hardware. The initial incarnations focus on data parallel programming models, partially because of the existing work in the area. However, task level parallelism is certainly anticipated and on the road map. In fact, one of the most interesting areas will be the interplay between the two.

The cross-platform aspect ensures that applications will be portable between different hardware platforms, from a functionality and correctness stand point. Performance will naturally vary across platforms and vendors, and improve over time as hardware evolves to exploit ever more parallelism. This means that OpenCL embraces multiple cores and vectorization as equally valid approaches and enables software to readily exploit both.

OpenCL is a C-like language, but with a number of restrictions to improve parallel execution (e.g. no recursion and limited pointers). For most implementations, the compiler back-end is based on LLVM, an open-source project out of UIUC. LLVM was a natural choice, as it is extensively used within Apple. It has a more permissive license than the GNU suite and many of the key contributors are employed with Apple.

The first widely supported, programmable GPUs were the DX10 generation from Nvidia, accompanied by a proprietary API, CUDA, and a fledging software ecosystem. To take advantage of this, Apple worked closely with Nvidia on their early efforts. The result is that OpenCL was heavily influenced by CUDA. In essence, CUDA served as a starting point and Apple then incorporated their own vision and a great deal of input from AMD, Imagination Technologies (which is responsible for nearly all cell phone graphics solutions) and Intel. Once the project was in good enough shape, Apple put OpenCL into the hands of the Khronos Group, the standards body behind OpenGL.

The lion’s share of the early OpenCL work was done by Apple and Nvidia. The first software implementation of OpenCL was a key feature in the v10.6 of the Mac OS, which was released in August of 2009. In order to promote the burgeoning standard, Apple mandated hardware support on all their PC systems, from the humble Mac Mini to the Mac Pro. Since Nvidia was the only compatible hardware solution early on, this gave them a virtual monopoly on Apple’s chipsets and graphics cards for the first several years. The rest of the industry signed onto OpenCL in fairly short order, however, actual hardware and software has only just begun to catch up and take shape.

The progress in the PC ecosystem has just started. Nvidia supports OpenCL across their full product line, as they have from inception. AMD took a slightly indirect route, first releasing OpenCL for CPUs (and GPUs using OS X) in August of 2009 and adding GPU support for Windows and Linux in December 2009. S3’s embedded graphics added OpenCL 1.0 in later 2009, as did VIA for the video processors in their chipsets. IBM has also a version of OpenCL for PowerPC and Cell processors. Of all the major players, Intel is taking the longest to release OpenCL compatible products. Their first CPU implementation will arrive in early 2011 with Sandy Bridge. Unfortunately, the Sandy Bridge GPU lacks certain required functionality, so the first GPU implementation of OpenCL will be on Ivy Bridge, the following year. Of all the different vendors, Nvidia’s support is by far the most full featured and robust, since it leverages their existing investment in CUDA. On the software side, things are moving slightly slower with only a handful of early adopters ? partially because the hardware support has just started to move beyond Nvidia.

Just as OpenGL is used in both the PC and embedded worlds, OpenCL also has generated substantial interest within the mobile and embedded ecosystem. Imagination Technologies, which is responsible for the vast majority of cell phone GPUs, announced OpenCL 1.0 support for the SGX545 graphics core. Samsung has a compatible solution, based on an ARM Cortex A9 microprocessor for cell phones. Perhaps more importantly, Khronos, has released an ‘Embedded Profile’ for OpenCL that relaxes some of the requirements to improve power efficiency and cost. Outside of the mobile world, it is conceivable (albeit unlikely) that FPGA vendors may use OpenCL as a programmer friendly interface (compared to Verilog) for their hardware, at the cost of some efficiency.

OpenCL Execution Model

General purpose computing on GPUs has been a topic of interest for a considerable time. The early work was in academia, primarily in the Stanford graphics group, and focused on using the existing limited shader languages (e.g. Brook) for general workloads. Many of the Stanford graphics graduate students went into industry and influenced the evolution of GPUs into programmable hardware. The first commercial API was CUDA, which has in turn influenced later APIs such as OpenCL and DirectCompute. All three APIs use variants of C that add and remove certain features. None of the languages are a superset of C, so not all C programs will map cleanly to the respective languages. Given the shared ancestry and shared starting language, it should not be surprising that there are many similarities between the three.

OpenCL, DirectCompute and CUDA are APIs designed for heterogeneous computing ? with both a host (i.e. CPU) and an OpenCL device. The device can be the same hardware as the host ? for instance a CPU can serve as both ? however, the OpenCL device is often different (e.g. a GPU or DSP).

OpenCL applications have serial portions, that execute on the host CPU, and parallel portions, known as kernels. The parallel kernels may execute on an OpenCL compatible device (CPU or GPU), and synchronization is enforced between kernels and serial code. OpenCL is distinctly intended to handle both task and data parallel workloads, while CUDA and DirectCompute are primarily focused on data parallelism.

A kernel applies a single stream of instructions to vast quantities of data that are organized as a 1-3 dimensional array (called an N-D range). Each piece of data is known as a work-item in OpenCL terminology, and kernels may have hundreds or thousands of work-items. At a high level, this sounds a lot like SIMD execution where each work-item is a SIMD lane. However, one of the key goals of OpenCL is to provide an extensible form of data parallelism that isn’t explicitly tied to specific vector lengths and can be mapped to all sorts of different hardware. So in some sense, an OpenCL kernel is a generalization of SIMD. The kernel itself is organized into many work-groups that are relatively limited in size; for example a kernel could have 32K work-items, but 64 work-groups of 512 items each. Unlike traditional computation, arbitrary communication within a kernel is strongly limited. However, communication and synchronization is generally allowed locally within a work-group. So work-groups serve two purposes. First, they break up a kernel into manageable chunks, and second, they define a limited scope for communication.

Kernels form the basis of OpenCL, but they can be composed into a task graph via asynchronous command queues. The programmer indicates dependencies between kernels, and what conditions must be met for a kernel to start execution. The OpenCL run-time layer can simultaneously execute independent kernels, thus extracting task parallelism within an application. While the initial uses of OpenCL will probably focus on data parallelism, the best performance will be achieved by combining task and data parallel techniques.

OpenCL defines a broad universe of data types for computation in each work-item. On the integer side, data types include boolean, character, short, int (32-bit), long and long long (128-bit). Most of these integer types are available in both signed and unsigned variants.

For floating point, OpenCL both defines a variety of data types and also specifies precision for most operations. The floating point data types are relatively standard ? single precision is required and double precision is optional. In addition, there is half precision (16-bit) floating point for data storage; computation is still done at single precision, but for less precise data, the storage requirements can be cut in half. Thankfully, OpenCL also enforces a minimum level of floating point precision and accuracy, generally consistent with IEEE 754. Double precision has the most stringent requirements, including a fused-multiply-accumulate instruction, all four rounding modes (nearest even, 0, +infinity, -infinity), and proper handling of denormal numbers, infinities and NaN. Single precision is somewhat more lax and only requires round to nearest even and handling infinities and NaN. In both cases, all operations have a guaranteed minimum precision ? this is especially critical for math functions that are implemented in libraries, such as transcendental functions. Half precision requires an IEEE compatible storage format and correct conversion.

OpenCL also provides a number of more sophisticated data types on top of these basic ones. Most data types (except half-precision and boolean) are part of the specification in vector form, with lengths 2, 4, 8 and 16. Vector operations are component-wise so that each lane is independent. This is a clear contrast to DirectCompute and CUDA, which only support vectors of length 2-4. OpenCL has pointers for many data types, which is beneficial to make developers comfortable, but it does come with a cost because it ends up creating potential aliasing problems (just as in C). Vectorization is critical for performance on many CPUs and GPUs (although not Nvidia GPUs), and will be much more heavily emphasized in OpenCL than in CUDA.

There are also data types for 2 and 3-dimensional images and texture sampling and filtering of images. The standard has reserved a number of other data types such as complex numbers (using floating point formats for the imaginary and real parts), matrices and high precision formats (128-bit integers and floating point). These are not part of OpenCL, but it is clear that they are all candidates for inclusion.

OpenCL Memory Model

The OpenCL memory model defines how data is stored and communicated both within a device and also between a device and the host CPU. There are four memory types (and address spaces) in OpenCL, which closely correspond to those in CUDA and DirectCompute, and they all interact with the execution model.

The first region, global memory, is available to any work-item for both read and write access. Global memory may be cached in the OpenCL device for higher performance and power efficiency, or may reside strictly in DRAM. Global memory is also fully accessible by the CPU host. Constant memory is a read-only region for work-items on the OpenCL device, but the host CPU has full read and write access. Since the region is read-only, it is freely accessible to any work-item. Conceptually, constant memory can be thought of as a portion of global memory that is read-only for the OpenCL device.

The remaining memory regions are only usable by the OpenCL device and are inaccessible to the host. The first is private memory, which is accessible to a single work-item for reads and writes, and corresponds roughly to an architectural register file in a classic instruction set. The vast majority of computation is done using private memory, thus in many ways it is the most performance critical. The second region is known as local memory and is accessible to a single work-group for reads and writes. Local memory is intended for shared variables and communication between work-items, in essence, it is an architectural register file that is shared between a limited number of work-items. Local memory can be held in DRAM and cached ? which is how most CPUs will implement it, while GPUs tend to favor dedicated hardware structures that are explicitly addressed.

The memory consistency model for OpenCL is fairly relaxed, with a number of primitives to assist. OpenCL defines four work-group synchronization primitives ? a barrier, and 3 types of fences (read fence, write fence, and a general memory fence). The barrier synchronizes an entire work-group, so the scope is limited by definition. The strength of the memory consistency is progressively weaker as the scope widens ? which makes sense. A strongly ordered model is easier with fewer caching and memory agents, and increasingly difficult to scale as more agents are added.

At the smallest scope, each work-item has fairly strong consistency and will preserve the ordering between an aliased load and store; however, non-aliased memory instructions can be freely re-ordered. Local memory is a bit weaker ? it is only consistent across a work-group at a barrier. Without a barrier, there are no ordering guarantees between the different work-items. Global memory is even weaker still; a barrier will guarantee consistency of global memory within a work-group, however, there are absolutely no guarantees between different work-groups in a kernel. Global atomic operations were an optional part of OpenCL 1.0 and are required in 1.1; they are used to guarantee consistency between any work-items in a kernel, specifically between different work-groups. Atomic operations are primarily defined for 32-bit integers, with an optional extension for 64-bit integers. They acquire exclusive access to a memory address (to ensure ordering) and perform a read-modify-write, returning the old value. Both OpenCL and CUDA return the old value, while this is strictly optional for DirectCompute. However, the performance cost of atomic operations is fairly high on some hardware, and should be avoided for scalability and performance. Since the constant memory is read-only, it needs no consistency or ordering model.

OpenCL uses a combination of pointers and buffers to move data within an application. Pointers are valid within a kernel - however, they are flushed at the end. So passing data between kernels (or between the host and device) uses buffers. This is another area where OpenCL diverges from CUDA - the latter persists pointers across kernels and does not use any buffers.

Terminology and Summary

One of the more confusing aspects of GPU computing is the terminology. The lexicon for CPUs and computer architecture is relatively consistent across vendors. For graphics APIs, there is a common language and understanding formed by DirectX and OpenGL that most hardware and software can follow. However, for graphic hardware, the terminology varies considerably and is often imprecise and subject to change ? fortunately, the common APIs give some semblance of order. In contrast to graphics and computer architecture, the idea of using GPUs for computation is relatively new. The industry standards are nascent, but hopefully OpenCL and DirectCompute will provide a relatively standard language to understand the software aspects of GPU computing. While the terminology in these APIs may not be universally adopted, they will reduce confusion by providing common ground. Equally important, since OpenCL is intended to run on almost any device, the common software architecture will be very helpful to understand the different flavors of hardware.

Table 1 - Comparison of OpenCL, DirectCompute and CUDA

The table shows the correspondence between different terminology in OpenCL, DirectCompute and CUDA and also compares certain features. One of the differences in the execution model for OpenCL and DirectCompute specifically omit any microarchitectural aspects of execution and avoids horizontal operations. Both of these choices improve portability and performance across many different devices. CUDA is a proprietary API and portability was never a goal, so Nvidia exposes warps and certain horizontal warp functions through the API.

The three APIs have changed the local memory capacity over time, in tune with advances in hardware. Early versions including OpenCL 1.0, Compute Shader 4.x and 1.x CUDA specified 16KB local memory; although in OpenCL that was only a minimum. The local memory for OpenCL 1.1 must be at least 32KB, while DirectCompute requires exactly 32KB. CUDA 2.x takes a slightly different tack and mirrors Nvidia’s Fermi hardware ? allowing local memory to be configured as either 16KB or 48KB. The reason that OpenCL focuses on minimum sizes for local memory is enabling a diversity of hardware. Most CPUs will use regular system memory, held in a cache, for local memory. Even L1 caches can easily exceed 32KB, and L2 and L3 caches are orders of magnitude larger. Moreover, the Cell processor actually has 256KB of local memory.

The changing storage capacity serves to highlight one of the pitfalls of OpenCL. While the specification does ensure functional correctness across different platforms, it does not guarantee optimal performance. Hardware can vary across a number of aspects: number of work-groups, latency, bandwidth and capacity of on-chip and off-chip memory (e.g. cache or registers, DRAM, etc.). Tuning for a specific platform will often result in suboptimal code for other platforms. For example, using 4-wide vectors on AMD GPUs is necessary for optimal performance, while Nvidia GPUs only see mild gains. As a result, software optimized for Nvidia platforms typically is unvectorized and will not run efficiently on AMD GPUs. This problem is universal for almost any cross-platform environment. However, the variations in performance for OpenCL on GPUs will be much larger than say, Java on CPUs, because the variations in microarchitecture are also much larger. One related issue is that all memory is statically allocated in OpenCL (i.e. at compile time), without any knowledge of the underlying hardware. Dynamically allocating memory (i.e. at run-time) would help to improve performance across different hardware. As a simple example, software that is written for a smaller 16KB local memory will leave performance on the table when using hardware that has more capacity (say 32KB, like AMD’s GPUs).

Despite its flaws, OpenCL holds great promise as an open, compatible and standards based approach to parallel computing on GPUs and other alternative devices. At present though, OpenCL is still in the very early stages with limited hardware and software. However, it has broad support throughout the PC and embedded ecosystems, and is just starting down the path to maturity as a common API for software developers. Judging by history though, OpenCL and DirectCompute will eventually come to dominate the landscape, just as OpenGL and DirectX became the standards for graphics.

저작자표시 (새창열림)

'기술 관련 글' 카테고리의 다른 글

Visual Studio Achievements "Bring Some Game To Your Code!" (0)	2012.04.07
C++ Books (0)	2012.03.31
RAID Level 설명 (0)	2011.09.02
감독 대 위원회, 애플 대 구글 (0)	2011.08.22
안드로이드와 특허, 구글의 위선 (0)	2011.08.06

Posted by 세월의돌

RAID Level 설명

[출처: ZDNet RAID 1-6 레벨 이해하기]

▶ RAID 0(디스크 스트라이핑)

* 최소 드라이브 개수 : 2

* 최대 용량 : 디스크의 수 x 디스크의 용량

* 설명 : 데이터를 블럭으로 쪼개서 저장하는데 각 블럭은 다른 디스크로 나뉘어 저장된다.

* 장점 : 매우 빠르다. 데이터는 여러 개의 "모터(spindles)"로 스토리지에서 읽고 쓴다. 즉, I/O 로드가 분산되는 것을 의미하기 때문에 매우 빠르다. 이론적으로 디스크를 추가하는 족족 성능은 향상된다. 보통 엄청난 성능이 필요할 때 사용하는데 성능이 정말 좋은지 알아 보기 위해 스토리지를 아이오미터(IOmeter)같은 도구를 사용하여 확인한다.

* 단점 : 드라이브 하나가 고장 나면 이 RAID 레벨은 어떤 안전장치도 없기 때문에 천체 어레이가 고장 날 수 있고 디스크를 추가할 수록 위험이 증가한다.(주: 어레이는 여러 개의 디스크 배열을 의미)

▶ RAID 1 (디스크 미러링)

* 최소 드라이브 개수 : 2

* 최대 용량 : (디스크의 수/2) x 디스크의 용량

* 설명 : 스토리지에 저장되는 모든 데이터는 두 개의 물리적인 디스크에 각각 저장되고 모든 데이터는 중복된다.

* 장점 : 드라이브 하나가 고장 나면 똑같은 내용의 다른 드라이브가 하나 더 있기 때문에 매우 안전하다. RAID 1은 읽기 성능이 단일 드라이브에서의 성능과 같거나 훨씬 좋다.

* 단점 : 각 드라이브는 미러링되기 때문에 전체 용량의 절반밖에 사용하지 못한다. 드라이브 두 개에 동일한 데이터를 써야 하기 때문에 쓰기 성능이 나빠질 수 있지만 아직 다른 RAID 레벨의 쓰기 성능보다는 훨씬 낫다.

▶ RAID 2: 이 레벨은 더 이상 사용되지 않는다

▶ RAID 3(패리티를 사용하고 디스크를 병렬로 처리한다)

* 최소 드라이브 개수 : 3

* 최대 용량 : (디스크의 수 - 1) x 각 디스크의 용량

* 설명 : 데이터는 바이트 단위로 쪼개져서 모든 디스크에 균등하게 나뉘어 저장되고 패리티 정보는 별도의 전용 디스크에 저장된다.

* 장점 : 한 개의 드라이브가 고장 나는 것을 허용하며 순차적 쓰기(sequential write) 성능과 순차적 읽기(sequential read) 성능이 우수하다.

* 단점 : 잘 사용되지 않고 문제를 해결하는 것이 어려울 수 있다. 하드웨어 RAID가 되어야 실제로 쓸만하다. RAID 3은 보통 매우 효율적이지만 임의 쓰기(random write) 성능이 나쁘고 임의 읽기(random read) 성능은 꽤 좋다. .

RAID 4 (각 디스크는 패리티 블럭을 공유한다)

* 최소 드라이브 개수 : 3

* 최대 용량 : (디스크의 수 - 1) x 디스크의 용량

* 설명 : 모든 파일은 블럭으로 쪼개지고 각 블럭은 여러 디스크에 저장되지만 균등하진 않다. RAID 3처럼 RAID 4도 패리티를 처리하기 위해 별도의 디스크를 사용한다. 동시 트랜잭션 사용량이 많은 시스템에서 읽기 속도는 매우 중요한데 이런 시스템에 적합하다.

* 장점 : 드라이브 하나가 고장 나는 것을 허용하고 읽기 성능이 매우 좋다.

* 단점 : 쓰기 성능이 나쁘지만 블럭 읽기(block read) 성능은 괜찮다.

▶ RAID 5(패리티를 순환시키는 것 없이 각 어레이에 접근한다)

* 최소 드라이브 개수 : 3

* 최대 용량 : (디스크의 수 - 1) x 디스크의 용량

* 설명 : RAID 4처럼 데이터의 블럭은 모든 디스크에 나뉘어 저장되지만 항상 균등하진 않고 패리티 정보도 모든 디스크에 나뉘어 저장된다.

* 장점 : 지원하는 회사가 많고 한 개의 드라이브가 고장 나는 것을 허용한다.

* 단점 : 디스크 재구성(rebuild)이 매우 느리고 쓰기 성능은 패리티 정보를 끊임없이 갱신해야 하기 때문에 우수하다고 할 수는 없다.

▶ RAID 6(각 디스크에 패리티 정보가 두 번 독립적으로 분산된다)

* 최소 드라이브 개수 : 3

* 최대 용량 : (디스크의 수 - 2) x 디스크의 용량

* 설명 : RAID 4처럼 데이터의 블럭은 모든 디스크에 나뉘어 저장되지만 항상 균등하진 않고 패리티 정보도 모든 디스크에 나뉘어 저장된다.

* 장점 : 두 개의 드라이브까지 고장 나는 것을 허용하고 읽기 성능이 우수하고 매우 중요한 경우에 적합하다.

* 단점 : 쓰기 성능은 패리티를 여러 번 갱신해야 하기 때문에 RAID 5보다 매우 나쁘다. 디스크를 재구성하는 동안에 성능이 매우 나빠질 수 있다.

저작자표시 (새창열림)

'기술 관련 글' 카테고리의 다른 글

C++ Books (0)	2012.03.31
Introduction to OpenCL (0)	2011.10.01
감독 대 위원회, 애플 대 구글 (0)	2011.08.22
안드로이드와 특허, 구글의 위선 (0)	2011.08.06
SSD의 TRIM이란? (0)	2011.04.06

Posted by 세월의돌

Stone Of Days

'기술 관련 글'에 해당되는 글 16건

‘친환경 수소의 가치와 미래’

'기술 관련 글' 카테고리의 다른 글

Cherry MX Board 2.0 한글키로 한/영전환 안되는 문제

'기술 관련 글' 카테고리의 다른 글

소스코드를 이미지로 공유

'기술 관련 글' 카테고리의 다른 글

Windows git-diff에서 CR(Carriage Return, ^M)을 무시하도록 하는 방법

'기술 관련 글' 카테고리의 다른 글

XLIFF (XML Localisation Interchange File Format)

'기술 관련 글' 카테고리의 다른 글

StarUML Bug - Cannot focus a disabled or invisible window

'기술 관련 글' 카테고리의 다른 글

Visual Studio Achievements "Bring Some Game To Your Code!"

'기술 관련 글' 카테고리의 다른 글

C++ Books

'기술 관련 글' 카테고리의 다른 글

Introduction to OpenCL

'기술 관련 글' 카테고리의 다른 글

RAID Level 설명

'기술 관련 글' 카테고리의 다른 글

카테고리

공지사항

태그목록

최근에 올라온 글

최근에 달린 댓글

최근에 받은 트랙백

글 보관함

달력

링크

티스토리툴바